欢迎访问中国科学院大学学报,今天是

中国科学院大学学报

• • 上一篇    下一篇

基于Huber损失的高维异质数据稳健子群分析*

苏菁, 韩超, 张伟平   

  1. 中国科学技术大学管理学院统计与金融系, 合肥 230026
  • 收稿日期:2025-03-25 修回日期:2025-06-19
  • 通讯作者: E-mail: lisha1400@bao.ac.cn
  • 基金资助:
    *国家重点研发计划(2021YFA1600500), 中国科学院基础与交叉前沿科研先导专项(XDB0560301),国家自然科学基金项目(11903055、U1931127)资助

Robust subgroup analysis based on Huber loss for high-dimensional heterogeneous data

SU Jing, HAN Chao, ZHANG Weiping   

  1. Department of Statistics and Finance, School of Management, University of Science and Technology of China,Hefei 230026, China
  • Received:2025-03-25 Revised:2025-06-19

摘要: 本文基于一般线性回归模型,考虑了模型中个体截距存在异质性并且协变量是高维的情形。为了应对数据异常问题和提高模型的稳健性,我们采用了Huber损失函数,同时我们提出了基于中心的惩罚来识别潜在子群,并通过使用非凸惩罚的方法来实现协变量的选择。在算法方面,我们提出了一种基于交替方向乘子法(ADMM)和坐标下降法的新型混合算法,实现了对目标函数的求解。理论层面,本文成功构建了Oracle估计量的渐近性质,并严谨地证明了其与目标函数的紧密关系,从而保证了所提方法在潜在子群识别、变量选择方面的有效性。数值模拟和实际数据分析充分展现了所提方法在子群识别和高维数据处理中的稳健性与有效性。

关键词: Huber损失, 子群分析, 高维数据, Oracle性质

Abstract: Based on the general linear regression model, this paper considers the heterogeneity of individual intercepts and the high dimensional covariates in the model. In order to deal with the problem of data anomalies and improve the robustness of the model, we adopt Huber loss function. Meanwhile, we propose a center-based penalty to identify potential subgroups and implement covariates selection by using concave penalty. In the aspect of algorithm, we design a new hybrid algorithm based on alternating direction multiplier method (ADMM) and coordinate descent method to solve the objective function. At the theoretical level, this paper successfully constructs the asymptotic property of Oracle estimators, and rigorously proves its close relationship with the objective function, which guarantees the effectiveness of the proposed method in potential subgroup identification and variable selection. Numerical simulation and empirical data analysis fully demonstrate the robustness and effectiveness of the proposed method in subgroup identification and high-dimensional data processing.

Key words: Huber loss, subgroup analysis, high-dimensional data, Oracle properties

中图分类号: