欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2024, Vol. 41 ›› Issue (2): 151-164.DOI: 10.7523/j.ucas.2022.037

• 数学与物理学 • 上一篇    下一篇

稳健的个体化亚组分析

张晓灵, 任明旸, 张三国   

  1. 中国科学院大学数学科学学院, 北京 100049; 中国科学院大数据挖掘与知识管理重点实验室, 北京 100049
  • 收稿日期:2022-02-08 修回日期:2022-04-13 发布日期:2022-04-26
  • 通讯作者: 张三国,E-mail:sgzhang@ucas.ac.cn

Robust individualized subgroup analysis

ZHANG Xiaoling, REN Mingyang, ZHANG Sanguo   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-02-08 Revised:2022-04-13 Published:2022-04-26
  • Supported by:
    Support by National Natural Science Foundation of China (12171454) and Key R&D Program of Guangxi (2020AB10023)

摘要: 异质群体的亚组分析是实现个体化医疗和个性化营销的关键所在。基于回归的方法是亚组分析的主要流派之一,这种范式将预测变量分为具有异质效应和同质效应的两部分,并根据异质变量是否相同将样本分为不同的亚组。然而,现有的基于回归的亚组分析方法大多有两大局限性:第一,它们仍然认为亚组内的样本是同质的,没有充分考虑个体效应;第二,没有考虑到同质变量中常见污染现象,这将导致模型结果出现较大偏差。为应对这些挑战,提出一种稳健的个体化亚组分析方法。使用多向分离惩罚函数估计模型异质部分的个体化效应,并使用γ散度得到同质部分的稳健估计。还提出一种高效的交替迭代的两步算法,这一方法结合了坐标下降法和交替方向乘子法。数值模拟和对皮肤黑色素瘤数据的分析进一步验证了所提方法的有效性。

关键词: 亚组分析, 多向分离惩罚, 稳健回归, 变量选择

Abstract: Subgroup analysis of heterogeneous groups is a crucial step in the development of individualized treatment and personalized marketing strategies. Regression-based approaches are one of the main schools of subgroup analysis, a paradigm that divides predictor variables into two parts with heterogeneous and homogeneous effects and divides the sample into subgroups based on the heterogeneous effects. However, most of the existing regression-based subgroup analysis methods have two major limitations: First, they still consider the sample homogeneous within subgroups and do not fully consider individual effects; Second, the common contamination phenomenon of homogeneous effect variables is not taken into account, which will lead to large bias in the model results. To address these challenges, we propose a robust individualized subgroup analysis. We use a multidirectional separation penalty function to achieve individualized effects analysis for the heterogeneous part of the model and use γ-divergence to obtain robust estimates for the contaminated homogeneous part. We also propose an efficient alternating iterative two-step algorithm, combining coordinate descent and alternating direction method of multipliers (ADMM) techniques to implement this process. Our proposed method is further illustrated by simulation studies and analysis of a skin cutaneous melanoma dataset.

Key words: subgroup analysis, multidirectional separation penalty, robust regression, variable selection

中图分类号: