欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2023, Vol. 40 ›› Issue (1): 12-20.DOI: 10.7523/j.ucas.2021.0016

• 数学与物理学 • 上一篇    

高维生存分析数据在带有测量误差情形下的变量选择方法

张家睿1,2, 吴耀华1   

  1. 1. 中国科学技术大学管理学院, 合肥 230026;
    2. 香港大学浙江科学技术研究院, 杭州 310000
  • 收稿日期:2020-12-30 修回日期:2021-03-08 发布日期:2021-05-31
  • 通讯作者: 张家睿,E-mail:zjrt46@mail.ustc.edu.cn
  • 基金资助:
    国家自然科学基金(72071187,11671374,71731010,71921001)资助

Variable selection method for high-dimensional survival error-in-variable data

ZHANG Jiarui1,2, WU Yaohua1   

  1. 1. School of Management, University of Science and Technology of China, Hefei 230026, China;
    2. Zhejiang Institute of Research and Innovation, University of Hong Kong, Hangzhou 310000, China
  • Received:2020-12-30 Revised:2021-03-08 Published:2021-05-31

摘要: 对带有删失的生存数据的分析是高维稀疏回归分析的一个重要组成部分。然而,过去的大量相关工作都是建立在干净原始数据这一基础之上的,实践中面对的往往都是缺失数据或带有测量误差的数据,因此对此类数据的研究实用性更强。而在已有的高维生存分析数据相关文献中,关于带有测量误差情形下变量选择的研究还略显空白。在此背景下,提出一种基于伪得分函数和最近邻半正定投影的方法,对带有测量误差的高维可加风险模型进行变量选择,并且通过随机模拟和实际数据分析验证了该方法可以取得很好的效果。

关键词: 变量选择, 高维, 可加风险模型, 测量误差

Abstract: Analysis with censored survival data plays an important role in high-dimensional sparse modeling. Much theoretical and applied work is based on clean data. However, we often face corrupted data with missing data or error-in-variable data and as a result analysis on error-in-variable data is more useful. While in the known literature, relatively few work has been done on high-dimensional survival data variable selecting with measurement error. In this situation, we propose a new method to select variables in high-dimensional additive hazards model with error-in-variable data, which combines the pseudoscore function and the nearest positive semi-definite projection. Our numerical studies and real data analysis show that the method has good performance and can select the nonzero coefficients successfully.

Key words: variable selection, high-dimensional, additive hazard model, error-in-variable data

中图分类号: