欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2019, Vol. 36 ›› Issue (4): 449-460.DOI: 10.7523/j.issn.2095-6134.2019.04.003

• 数学与物理学 • 上一篇    下一篇

基于角度的变系数多分类支持向量机

康文佳1, 林文辉2, 张三国1   

  1. 1. 中国科学院大学数学科学学院, 北京 100049;
    2. 航天信息股份有限公司技术研究院, 北京 100195
  • 收稿日期:2017-11-27 修回日期:2018-04-23 发布日期:2019-07-15
  • 通讯作者: 张三国
  • 基金资助:
    Supported by the open project of Hubei Collaborative Innovation Center for Early Warning and Emergency Response Technology (JD20150402)

Targeted local angle-based multi-category support vector machine

KANG Wenjia1, LIN Wenhui2, ZHANG Sanguo1   

  1. 1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China;
    2. Technology Research Institute, Aisino Corporation, Beijing 100195, China
  • Received:2017-11-27 Revised:2018-04-23 Published:2019-07-15
  • Supported by:
    Supported by the open project of Hubei Collaborative Innovation Center for Early Warning and Emergency Response Technology (JD20150402)

摘要: 支持向量机作为机器学习中一个经典的分类算法,一直广受数据科学家的喜爱。无论是处理线性可分还是非线性可分数据,传统的支持向量机能够很好地解决二分类问题。针对给定的样本,支持向量机通过最大化最小间隔得到最佳的决策分界面,从而实现对新样本的类别预测。然而现实中的数据更为复杂多样,一方面数据的类别往往多于两个,近年不乏有优秀的多分类支持向量机算法出现;另一方面不同领域的数据的特征集中可能存在相对特殊的变量(称之为主变量,targeted variable),需要将其挑选出来并加以特殊处理,以保持主变量对最终分类结果的重要影响。考虑这两个方面,提出基于角度的变系数多分类支持向量机(TLAMSVM)模型以解决含有主变量的多分类问题。它使用具备更好几何解释能力的基于角度的间隔最大分类框架完成多分类,并引入变系数模型,通过选择合适的局部光滑函数处理主变量对模型的影响。把基于角度的变系数多分类支持向量机分别应用到模拟数据集和真实数据集上。数值结果显示,相比没有使用变系数思想或基于角度的多分类框架的多分类支持向量机,TLAMSVM模型具有更高的预测准确度。

关键词: 局部光滑, 多分类支持向量机, 基于角度的间隔最大分类框架

Abstract: The support vector machine(SVM) is one of the most concise and efficient classification methods in machine learning. Traditional SVMs mainly handle with binary classification problems by maximizing the smallest margins. However, the real-world data are much more complicated. On the one hand, the label set usually has more than two categories, so SVMs need to be generalized for solving multi-category problems reasonably. On the other hand, there may exist one special variable which should be singled out to preserve its effect on the final results from other variables such as age in bioscience field. We name such a special variable as targeted variable. In this work, in order to take both aspects mentioned above into consideration, targeted local angle-based multi-category support vector machine(TLAMSVM) is proposed. This new model not only solves multi-category problems but also pays special attention to targeted variable. Moreover, TLAMSVM solves multi-classification in the framework of angle-based method, which provides a better interpretation from the geometrical viewpoint, and it uses local smoothing method to pool the information of targeted variable. In order to validate the classification effect of TLAMSVM model, we apply it to both simulated and real data sets, respectively, and get the expected results in numerical experiments.

Key words: local smoothing, multi-category support vector machine, angle-based maximum margin classification framework

中图分类号: