欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2018, Vol. 35 ›› Issue (4): 536-543.DOI: 10.7523/j.issn.2095-6134.2018.04.017

• 信息与电子科学 • 上一篇    下一篇

基于类重叠度欠采样的不平衡模糊多类支持向量机

吴园园, 申立勇   

  1. 中国科学院大学数学科学学院, 北京 100049
  • 收稿日期:2017-05-02 修回日期:2017-06-02 发布日期:2018-07-15
  • 通讯作者: 申立勇
  • 基金资助:
    湖北省协同创新中心开放课题(JD20150402)资助

Imbalanced fuzzy multiclass support vector machine algorithm based on class-overlap degree undersampling

WU Yuanyuan, SHEN Liyong   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2017-05-02 Revised:2017-06-02 Published:2018-07-15

摘要: 传统的欠采样方法容易丢失重要的样本信息,且其实验结果的稳定性较差。针对上述问题,提出一种基于类重叠度欠采样的不平衡数据模糊多类支持向量机算法。该算法首先采用LOF局部离群点因子和箱线图的方法清洗训练数据集中的噪声样本,然后根据类重叠度抽取对分类起关键作用的支持向量,并且将代表每个样本点重要程度的类重叠度作为隶属度值,构造模糊多类支持向量机。实验结果表明,该算法克服了随机欠采样的支持向量机容易丢失重要样本信息和实验结果不稳定的缺点,且很好地提升了支持向量机在不平衡且含噪声的数据集上的分类精度,并保持较高的计算效率。

关键词: 支持向量机, 模糊多类支持向量机, 噪声, 不平衡数据, 类重叠度

Abstract: Undersampling is a commonly-used method for data reconstruction. This method is used to solve the problem of imbalanced data classification. However, the traditional undersampling method often loses important sample information, and lacks stabilities of experimental results. To settle these two problems, this paper proposes an imbalanced fuzzy multiclass support vector machine algorithm based on class-overlap degree undersampling. This algorithm combines LOF local outlier factor and box-whisker plot to delete noise samples in the training datasets, then extracts support vectors based on class-overlap degree. Finally, the class-overlap degree of each sample is set as the membership value of this sample, and the fuzzy multiclass support vector machine is constructed. Experimental results show that our algorithm overcomes the disadvantages that the support vector machine with random undersampling often loses the important sample information and the unstabilities of experimental results. In addition, our algorithm improves the classification accuracy of support vector machine in imbalanced and noisy datasets.

Key words: support vector machine, fuzzy multiclass support vector machine, noise, imbalanced datasets, class-overlap degree

中图分类号: