Welcome to Journal of University of Chinese Academy of Sciences,Today is

Journal of University of Chinese Academy of Sciences

    Next Articles

Robust semi-supervised learning model based on model averaging and γ-divergence

WU Huizhen, ZHANG Sanguo   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100049, China
  • Received:2024-01-03 Revised:2024-04-18

Abstract: Semi-supervised learning is a key research problem in the field of pattern recognition and machine learning, and has been widely used in various fields in recent years. In practical problems, labeled samples are costly to obtain, while unlabeled samples are easier to obtain despite the lack of labeling information. Semi-supervised learning uses a large amount of unlabeled data and a small amount of labeled data at the same time to perform pattern recognition work. In this paper, we propose a robust semi-supervised approach based on model averaging and γ-divergence: on the one hand, the problem of low quality of unlabeled data is addressed by introducing model averaging method; on the other hand, the problem of mislabeling of labeled data is addressed by introducing logistic regression based on γ-divergence. One of the advantages of the proposed model is that we are able to process the data by exploiting the predictive differences of the different models to effectively utilize the information of the unlabeled data while minimizing the harmful information in it. And by introducing γ-divergence to reduce the effect of mislabeled data in labeled data on the fitting effect, we ultimately obtain a model that is robust for both unlabeled and labeled data. Simulation studies and applications of Breast Cancer Wisconsin Dataset show that compared with existing semi-supervised learning methods, the new method proposed in this paper has a significant improvement in prediction performance when the data quality is low.

Key words: semi-supervised learning, model averaging, γ-divergence, robustness

CLC Number: