SA-DBSCAN:一种自适应基于密度聚类算法

doi:10.7523/j.issn.2095-6134.2009.4.015

中国科学院大学学报 ›› 2009, Vol. 26 ›› Issue (4): 530-538.DOI: 10.7523/j.issn.2095-6134.2009.4.015

SA-DBSCAN:一种自适应基于密度聚类算法

夏鲁宁, 荆继武

中国科学院研究生院，信息安全国家重点实验室，京 100049

收稿日期:2008-06-26 修回日期:2008-12-25 发布日期:2009-07-15
基金资助:
国家高技术研究发展计划(863)项目(2003AA144050)资助 

SA-DBSCAN:A self-adaptive density-based clustering algorithm

XIA Lu-Ning, JING Ji-Wu

State Key Laboratory of Information Security, Chinese Academy of Sciences, Beijing 100049,China

Received:2008-06-26 Revised:2008-12-25 Published:2009-07-15

摘要/Abstract

摘要：

DBSCAN是一种经典的基于密度聚类算法,能够自动确定簇的数量,对任意形状的簇都能有效处理.DBSCAN算法需要人为确定Eps和minPts?2个参数,导致聚类过程需人工干预才能进行.在DBSCAN的基础上提出了SA-DBSCAN聚类算法,通过分析数据集统计特性来自动确定Eps和minPts参数,从而避免了聚类过程的人工干预,实现聚类过程的全自动化.实验表明,SA-DBSCAN能够选择合理的Eps和minPts参数并得到较高准确度的聚类结果.

关键词: 数据挖掘, 聚类, DBSCAN, SA-DBSCAN

Abstract:

DBSCAN is a classic density-based clustering algorithm. It can automatically determine the number of clusters and treat clusters of arbitrary shapes. In the clustering process of DBSCAN, two parameters, Eps and minPts,have to be specified by uses. In this paper an adaptive algorithm named SA-DBSCAN was introduced to determine the two parameters automatically via analysis of the statistical characteristics of the dataset, which enabled clustering process of DBSCAN fully automated. Experimental results indicate that SA-DBSCAN can select appropriate parameters and gain a rather high validity of clustering.

Key words: data mining, clustering, DBSCAN, SA-DBSCAN

中图分类号:

TP181

夏鲁宁, 荆继武. SA-DBSCAN:一种自适应基于密度聚类算法[J]. 中国科学院大学学报, 2009, 26(4): 530-538.

XIA Lu-Ning, JING Ji-Wu. SA-DBSCAN:A self-adaptive density-based clustering algorithm[J]. , 2009, 26(4): 530-538.

参考文献

[1] MacQueen J. Some methods for classification and analysis of multivariate observations //LeCam L, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability. Berkeley: University of California Press, 1967:281-297.

[2] Leonard Kaufman, Peter J Rousseeuw. Finding groups in data: An introduction to cluster analysis
[M]. New York: Wiley Press, 2005.

[3] Tan P N, Steinbach M, Kumar V著, 范明,范宏建,等译. 数据挖掘导论(Introduction to Data Mining).北京: 人民邮电出版社, 2006.

[4] Ester M, Kriegel H P,Sander J. A density-based algorithm for discovering clusters in large spatial databases with noise //Simoudis E, Han JW, Fayyad UM, eds. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996:226-231.

[5] Ankerst M,Breunig M M, Kriegel H P. OPTICS: ordering points to identify the clustering structure //Alex Delis, Christos Faloutsos,Shahram Ghandeharizadeh eds. Proceedings of the ACM SIGMOD'99 Int Conf on Management of Data. Philadelphia Pennsylvania: ACM Press, 1999: 49-60.

[6] Hinneburg A,Keim D A. An efficient approach to clustering in large multimedia databases with noise //Rakesh Agrawal,Paul Stolorz,eds. Proceedings of the 4^th Int Conf on Knowledge Discovery and Data Mining. New York: AAAI Press, 1998: 58-65.

[7] Feng P J,Ge L D. Adaptive DBSCAN-based algorithm for constellation reconstruction and modulation identification //Keyun Tang, Dayong Liu,eds. Proceedings of Radio Science Conference 2004. Beijing: Pub House of Electronics Industry, 2004: 177-180.

[8] Halkidi M, Vazirgiannis M. Clustering validity assessment: finding the optimal partitioning of a data set //Nick Cercone, Tsau Young Lin,Xindong Wu eds. Proceedings of the 2001 IEEE International Conference on Data Mining. California: IEEE Computer Society, 2001: 187-194.

[9] Yue S H, Li P,Guo J D, et al. A statistical information-based clustering approach in distance space
[J]. Journal of Zhejiang University Science, 2005, 6A(1): 71-78.

[10] Xu X, Ester M,Kriegel H P, et al. A distribution-based clustering algorithm for mining in large spatial databases //Philip S Yu,eds. Proceedings of the 14th international conference on data engineering (ICDE'98). Orlando: IEEE Computer Society Press, 1998: 324-331.

[11] Lin C Y, Chang C C, Lin C C. A new density-based scheme for clustering based on genetic algorithm
[J]. Fundamenta Informaticae, 2005, 68(4): 315-331.

[12] Cai Y K, Xie K Q, Ma X J. An improved DBSCAN algorithm which is insensitive to input parameters
[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2004, 40(3): 480-486 (in Chinese). 蔡颖琨, 谢昆青, 马修军. 屏蔽了输入参数敏感性的DBSCAN改进算法
[J]. 北京大学学报:自然科学版, 2004, 40(3): 480-486.

[13] Su Z, Ma S P, Yang Q, et al. Document clustering based on web-log mining
[J]. Journal of Software, 2002, 13(1): 99-104(in Chinese). 苏中, 马少平, 杨强,等. 基于Web-Log Mining的Web文档聚类
[J]. 软件学报, 2002, 13(1): 99-104.

[14] 吴梅村编著. 数理统计学基本原理和方法
[M]. 成都: 西南财经大学出版社, 2006.

[15] Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques:technical report .Minnesota: University of Minnesota-Computer Science and Engineering, 2000.

SA-DBSCAN:一种自适应基于密度聚类算法

SA-DBSCAN:A self-adaptive density-based clustering algorithm

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

访问统计

联系我们

[1]	杨随心, 耿修瑞, 杨炜暾, 赵永超, 卢晓军. 一种基于谱聚类算法的高光谱遥感图像分类方法[J]. 中国科学院大学学报, 2019, 36(2): 267-274.
[2]	隋小芸, 朱廷劭, 汪静莹. 基于局部特征优化的语音情感识别[J]. 中国科学院大学学报, 2017, 34(4): 431-438.
[3]	邢涛, 黄友红, 胡庆荣, 李军, 王冠勇. 基于动态K均值聚类算法的SAR图像分割[J]. 中国科学院大学学报, 2016, 33(5): 674-678.
[4]	公雪霜, 于丽君, 聂跃平, 朱建峰, 潘玉青. 辽宁西部地区先秦时期聚落遗址空间格局分析[J]. 中国科学院大学学报, 2016, 33(3): 373-379.
[5]	吴文娣, 程希骏, 刘峰. 基于K-means聚类和广义熵约束的CVaR投资组合模型[J]. 中国科学院大学学报, 2016, 33(1): 31-36.
[6]	倪平, 张玉清, 闻观行, 刘奇旭, 范丹. 基于群体特征的社交僵尸网络检测方法[J]. 中国科学院大学学报, 2014, 31(5): 691-700.
[7]	谢小龙, 李毅. 陇西栽培蒙古黄芪生物学性状的多元统计分析[J]. 中国科学院大学学报, 2013, 30(4): 478-484.
[8]	毛万峰, 张红, 张波, 王超. 基于模糊水平集的SAR图像分割方法[J]. 中国科学院大学学报, 2013, 30(2): 238-243.
[9]	王秋明, 高慧颖, 刘科成. 基于模糊聚类及灰色关联的软件需求分析方法[J]. 中国科学院大学学报, 2010, 27(6): 859-863.
[10]	曹政, 朱明. 一种快速有效的相似视频检索方法[J]. 中国科学院大学学报, 2010, 27(3): 376-380.
[11]	王晶, 夏鲁宁, 荆继武. 一种基于密度最大值的聚类算法[J]. 中国科学院大学学报, 2009, 26(4): 539-548.
[12]	宋进亮罗铁坚陈肃刘伟. 一种利用聚类思想解决重复任务问题的处理方法[J]. 中国科学院大学学报, 2009, 26(1): 107-113.
[13]	荆巍巍，黄刘生，姚亦飞，徐维江. 保护私有信息的统计量化规则挖掘[J]. 中国科学院大学学报, 2008, 26(6): 771-780.
[14]	秦钰;　荆继武;　向继;　张爱华. 基于优化初始类中心点的K-means改进算法[J]. 中国科学院大学学报, 2007, 24(6): 771-777.
[15]	谢小龙胡延萍赵旭东王莉李毅. 陇西栽培蒙古黄芪酯酶同工酶数量分析[J]. 中国科学院大学学报, 2007, 24(4): 525-529.