欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2007, Vol. 24 ›› Issue (6): 771-777.DOI: 10.7523/j.issn.2095-6134.2007.6.008

• 论文 • 上一篇    下一篇

基于优化初始类中心点的K-means改进算法

秦钰; 荆继武; 向继; 张爱华   

  1. 信息安全国家重点实验室(中国科学院研究生院)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 发布日期:2007-11-15

An Improved K-means Algorithm Based on Optimizing Initial Points

QIN Yu, JING Ji-Wu, XIANG Ji, ZHANG Ai-Hua   

  1. The State Key Laboratory of Information Security(Graduate University of Chinese Academy of Sciences)
  • Received:1900-01-01 Revised:1900-01-01 Published:2007-11-15

摘要: K-means算法是一种重要的聚类算法,在网络信息处理领域有着广泛的应用。由于K-means算法终止于一个局部最优状态,所以初始类中心点的选择会在很大程度上影响其聚类效果。本文提出了一种K-means算法的改进算法,首先探测数据集中的相对密集区域,再利用这些密集区域生成初始类中心点。该方法能够很好地排除类边缘点和噪声点的影响,并且能够适应数据集中各个实际类别密度分布不平衡的情况,最终获得较好的聚类效果。

关键词: 聚类, K-means, 初始类中心点

Abstract: K-means is an important clustering algorithm. It is widely used in Internet information processing technologies. Because the procedure terminates at a local optimum, K-means is sensitive to initial starting condition. An improved algorithm is proposed, which searches for the relative density parts of the database and then generates initial points based on them. The method can achieve higher clustering accuracies by well excluding the effects of edge points and outliers, as well as adapt to databases which have very skewed density distributions.

Key words: Clustering, K-means, Initial Points

中图分类号: