欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (3): 289-301.DOI: 10.7523/j.ucas.2020.0056

• 数学 •    下一篇

基于γ-散度的稳健有序误标记logistic回归

郭美君1,2, 任明旸1,2, 李仕明3, 张三国1,2   

  1. 1. 中国科学院大学数学科学学院, 北京 100049;
    2. 中国科学院大数据挖掘与知识管理重点实验室, 北京 100049;
    3. 首都医科大学附属北京同仁医院北京同仁眼科中心, 北京眼科及视光学重点实验室, 北京 100730
  • 收稿日期:2020-01-03 修回日期:2020-04-13 发布日期:2021-06-04
  • 通讯作者: 张三国
  • 基金资助:
    Supported by University of Chinese Academy of Sciences (Y95401TXX2),Beijing Natural Science Foundation (Z190004),and Key Program of Joint Funds of the National Natural Science Foundation of China (U19B2040)

Robust ordinal mislabel logistic regression based on γ-divergence

GUO Meijun1,2, REN Mingyang1,2, LI Shiming3, ZHANG Sanguo1,2   

  1. 1 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
    2 Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100049, China;
    3 Beijing Tongren Eye Center, Beijing Tongren Hospital, Beijing Ophthalmology & Visual Science Key Laboratory, Beijing Institute of Ophthalmology, Capital Medical University, Beijing 100730, China
  • Received:2020-01-03 Revised:2020-04-13 Published:2021-06-04
  • Supported by:
    Supported by University of Chinese Academy of Sciences (Y95401TXX2),Beijing Natural Science Foundation (Z190004),and Key Program of Joint Funds of the National Natural Science Foundation of China (U19B2040)

摘要: 有序多分类方法已经得到了广泛研究。传统的有序多分类方法假设样本的类别标签是不存在误标记的。但是由于实际数据复杂以及人工经验有限,获得标记完全正确的样本是不现实的,因此,传统的方法就存在局限性。提出一种基于γ-散度的有序误标记logistic回归方法,在处理存在误标记的有序多分类问题时具有很强的稳健性,也就是说,当某一样本被错误标记时它对参数估计的权重小于其被正确标记时的权重。该方法通过最小化γ-散度构建模型,利用梯度下降算法求解模型,不仅具有很强的稳健性而且在模型中可以忽略误标记概率。模拟研究和真实数据分析都说明该有序误标记logistic回归方法在处理存在误标记的有序分类问题时效果很好。

关键词: γ-散度, logistic 回归, 误标记, 有序分类, 稳健性

Abstract: Ordinal multi-classification methods have been studied widely. Traditional ordinal multi-classification methods assume that the sample label is not mislabeled. Due to the complexity of the real data and the limited artificial experience, it is unrealistic to obtain completely accurate labels, in which conventional methods perform poorly. In this article, we propose an ordinal mislabel logistic regression method based on γ-divergence, which possessing strong robustness when dealing with ordinal mislabeled response data. That is to say, when mislabeled, the weight of the sample in parameter estimation equation diminish compared to the case that the sample is properly labeled. Our method not only possesses the robustness but also can ignore the mislabel probabilities in the model. We construct the model by minimizing γ-divergence estimation and solve the model by gradient descent algorithm. Both simulation studies and real data analysis demonstrate that the method, namely robust ordinal mislabel logistic regression, is efficient to analyze ordinal mislabeled response data.

Key words: γ-divergence, logistic regression, mislabeled response, ordinal classification, robustness

中图分类号: