欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2009, Vol. 26 ›› Issue (4): 443-450.DOI: 10.7523/j.issn.2095-6134.2009.4.003

• 论文 • 上一篇    下一篇

三种分类算法偏差-方差结构的比较: MCLP, LDA 和C5.0

朱梅红1,2,3, 石勇1,2, 李爱华4, 张东玲1,2   

  1. 1. 中国科学院研究生院, 北京 100080;
    2. 中国科学院虚拟经济与数据科学研究中心, 北京 100080;
    3. 首都经济贸易大学统计学院, 北京 100070;
    4. 中央财经大学管理科学与工程学院, 北京 100081
  • 收稿日期:2008-11-17 修回日期:2009-03-02 发布日期:2009-07-15
  • 通讯作者: 朱梅红
  • 基金资助:

    国家自然科学基金(70621001, 70531040, 70501030, 10601064, 70472074, 90718042)、北京市自然科学基金(9073020)和973项目(2004CB720103)资助 

Comparison of bias-variance structure of three classification algorithms:MCLP, LDA and C5.0

ZHU Mei-Hong1,2,3, SHI Yong1,2, LI Ai-Hua4, ZHANG Dong-Ling1,2   

  1. 1. Graduate University of the Chinese Academy of Sciences, Beijing 100080, China;
    2. Research Center on Fictitious Economy & Data Sciences, Chinese Academy of Sciences, Beijing 100080, China;
    3. School of Statistics, Capital University of Economics and Business, Beijing 100070, China;
    4. School of Management Science and Engineering, Central University of Finance and Economics, Beijing 100081, China
  • Received:2008-11-17 Revised:2009-03-02 Published:2009-07-15

摘要:

基于Domingos的期望预测误差分解框架,在3个数据集上,对MCLP、LDA和C5.0这3种算法的偏差-方差结构特点进行了比较分析. 实验结果表明,一般来说,C5.0呈现低偏差-高方差的特点,LDA与之相反,而MCLP则介于两者之间,比较接近LDA. 当训练集样本量较小时,MCLP的偏差和方差都相对较高,而随着训练集的增大,MCLP的偏差和方差明显减小,甚至低于其他两者.

关键词: 多目标线性规划, 线性判别分析, C5.0, 偏差, 方差

Abstract:

Based on Domingos bias-variance decomposition framework, on three different data sets, we compared the bias-variance structure of the three classification methods: MCLP, LDA and C5.0. The experimental results showed that, generally speaking, C5.0 has low bias and high variance, LDA has high bias and low variance, and MCLP is in between them but near LDA. When the training set is small, bias and variance of MCLP is comparatively high. However, with the increasing of training set, bias and variance of MCLP obviously decrease and even are lower than those of C5.0 and LDA. This study established the basis for constructing the ensemble suited to MCLP.

Key words: multiple-criteria linear programming(MCLP), linear discrimant analysis(LDA), C5.0, bias, variance

中图分类号: