一种基于密度最大值的聚类算法

doi:10.7523/j.issn.2095-6134.2009.4.016

中国科学院大学学报 ›› 2009, Vol. 26 ›› Issue (4): 539-548.DOI: 10.7523/j.issn.2095-6134.2009.4.016

一种基于密度最大值的聚类算法

王晶^1,2, 夏鲁宁², 荆继武²

1. 中国科学技术大学电子工程与信息科学系，合肥 230027;
2. 中国科学院研究生院信息安全国家重点实验室，北京 100049

收稿日期:2008-10-08 修回日期:2009-01-09 发布日期:2009-07-15
通讯作者: 王晶
基金资助:
国家863计划(2006AA01Z454)和电子信息产业发展基金资助 

Maximum density clustering algorithm

WANG Jing^1,2, XIA Lu-Ning², JING Ji-Wu²

1. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China;
2. State Key Lab of Information Security, Graduate University of the Chinese Academy of Sciences, Beijing 100049, China

Received:2008-10-08 Revised:2009-01-09 Published:2009-07-15

摘要/Abstract

摘要：

提出了一种结合了基于密度聚类思想的划分聚类方法——"密度最大值聚类算法(MDCA)",以最大密度对象作为起始点,通过考察最大密度对象所处空间区域的密度分布情况来划分基本簇,并合并基本簇获得最终的簇划分.实验表明,MDCA能够自动确定簇数量,并有效发现任意形状的簇,对于未知数据集的处理能力和聚类准确度都优于传统的基于划分聚类算法.

关键词: 数据挖掘, 聚类, 最大密度对象, k-means, DBSCAN

Abstract:

This paper proposes a new clustering algorithm named maximum density clustering algorithm(MDCA). In MDCA the concept of density is introduced to identify the count of clusters automatically.By selecting the densest object as the threshold, densities of those objects around the densest object are reviewed to decide the partition of basic blocks. Then the basic blocks are merged to form clusters of arbitrary shape. Experiments show that the ability and validity of MDCA in processing unknown datasets are all better than traditional partition-based clustering algorithms.

Key words: data mining, clustering algorithm, densest object, k-means, DBSCAN

中图分类号:

TP181

王晶, 夏鲁宁, 荆继武. 一种基于密度最大值的聚类算法[J]. 中国科学院大学学报, 2009, 26(4): 539-548.

WANG Jing, XIA Lu-Ning, JING Ji-Wu. Maximum density clustering algorithm[J]. , 2009, 26(4): 539-548.

参考文献

[1] MacQueen J. Some methods for classification and analysis of multivariate observations //LeCam L M,Neyman J,eds. Proc of Fifth Berkeley Symposium on Math. Stat and Prob: University of California Press, 1967:281-297.

[2] Tan P N,Steinbach M,等著. 范明,范宏建,等译.数据挖掘导论(Introduction to Data Mining)
[M]. 北京:人民邮电出版社, 2006.

[3] Ester M, Kriegel H P, Sander J. A density-based algorithm for discovering clusters in large spatial databases with noise //Usama M Fayyad, Padhraic Smyth, Gregory Piatetsky-Shapiro,eds. Proc of 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96). Portland: ACM Press, 1996:226-231.

[4] Ankerst M, Breunig M M, et al. OPTICS: ordering points to identify the clustering structure //Alex Delis, Christos Faloutsos, Shahram Ghandeharizadeh, eds. Proc ACM SIGMOD'99 Int Conf on Management of Data. Philadelphia Pennsylvania: ACM Press, 1999:49-60.

[5] Agrawal R, Gehrke J, Gunopulos D, et al. Automatic subspace clustering of high dimensional data for data mining applications //Laura Haas,Ashutosh Tiwary,eds. Proc of 1998 ACM-SIGMOD Intl Conf on Management of Data. Seattle, Washington: ACM Press, 1998:94-105.

[6] Katsavounidis I, Kuo C, Zhang Z. A new initialization technique for generalized lloyd iteration
[J]. IEEE Signal Processing Letters, 1994, 1(10): 144-146.

[7] Tou J T,Gonzalez R C. Pattern recognition principles
[M].Dyersburg, TN, USA: Addison-Wesley, 1975.

[8] Christian Mauceri, Diem Ho. Clustering by kernel density
[J]. Computational Economics, 2007, 29(2): 199-212.

[9] Liu N, Zhang B Y, Yan J, et al. Learning similarity measures in the non-orthogonal space //Grossman D, Gravano L, Zhai C, Herzog O, Evans D, eds. Proc of the 13th Conf on Information and Knowledge Management (CIKM 2004). New York: ACM Press, 2004:334-341.

[10] Jarvis R A, Patrick E A. Clustering using a similarity measure based on shared nearest neighbors
[J]. IEEE Transactions on Computers, 1973, C-22(11): 1025-1034.

[11] 王世儒.计算方法
[M]. 西安: 西安电子科技大学出版社,1999.

[12] Steinbach M, Karypis G, et al. A comparison of document clustering techniques. Computer Science and Engineering Technical Report, Report No. 00-034 . Minnesota USA: University of Minnesota, 2000.

[1]	杨随心, 耿修瑞, 杨炜暾, 赵永超, 卢晓军. 一种基于谱聚类算法的高光谱遥感图像分类方法[J]. 中国科学院大学学报, 2019, 36(2): 267-274.
[2]	隋小芸, 朱廷劭, 汪静莹. 基于局部特征优化的语音情感识别[J]. 中国科学院大学学报, 2017, 34(4): 431-438.
[3]	邢涛, 黄友红, 胡庆荣, 李军, 王冠勇. 基于动态K均值聚类算法的SAR图像分割[J]. 中国科学院大学学报, 2016, 33(5): 674-678.
[4]	公雪霜, 于丽君, 聂跃平, 朱建峰, 潘玉青. 辽宁西部地区先秦时期聚落遗址空间格局分析[J]. 中国科学院大学学报, 2016, 33(3): 373-379.
[5]	吴文娣, 程希骏, 刘峰. 基于K-means聚类和广义熵约束的CVaR投资组合模型[J]. 中国科学院大学学报, 2016, 33(1): 31-36.
[6]	倪平, 张玉清, 闻观行, 刘奇旭, 范丹. 基于群体特征的社交僵尸网络检测方法[J]. 中国科学院大学学报, 2014, 31(5): 691-700.
[7]	谢小龙, 李毅. 陇西栽培蒙古黄芪生物学性状的多元统计分析[J]. 中国科学院大学学报, 2013, 30(4): 478-484.
[8]	毛万峰, 张红, 张波, 王超. 基于模糊水平集的SAR图像分割方法[J]. 中国科学院大学学报, 2013, 30(2): 238-243.
[9]	王秋明, 高慧颖, 刘科成. 基于模糊聚类及灰色关联的软件需求分析方法[J]. 中国科学院大学学报, 2010, 27(6): 859-863.
[10]	曹政, 朱明. 一种快速有效的相似视频检索方法[J]. 中国科学院大学学报, 2010, 27(3): 376-380.
[11]	夏鲁宁, 荆继武. SA-DBSCAN:一种自适应基于密度聚类算法[J]. 中国科学院大学学报, 2009, 26(4): 530-538.
[12]	宋进亮罗铁坚陈肃刘伟. 一种利用聚类思想解决重复任务问题的处理方法[J]. 中国科学院大学学报, 2009, 26(1): 107-113.
[13]	荆巍巍，黄刘生，姚亦飞，徐维江. 保护私有信息的统计量化规则挖掘[J]. 中国科学院大学学报, 2008, 26(6): 771-780.
[14]	秦钰;　荆继武;　向继;　张爱华. 基于优化初始类中心点的K-means改进算法[J]. 中国科学院大学学报, 2007, 24(6): 771-777.
[15]	谢小龙胡延萍赵旭东王莉李毅. 陇西栽培蒙古黄芪酯酶同工酶数量分析[J]. 中国科学院大学学报, 2007, 24(4): 525-529.

一种基于密度最大值的聚类算法

Maximum density clustering algorithm

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

访问统计

联系我们