利用集合差异度实现基于内容聚类的P2P搜索模型

doi:10.7523/j.issn.2095-6134.2007.2.016

中国科学院大学学报 ›› 2007, Vol. 24 ›› Issue (2): 241-247.DOI: 10.7523/j.issn.2095-6134.2007.2.016

利用集合差异度实现基于内容聚类的P2P搜索模型

王菁张焕杰杨寿保高鹰

中国科学技术大学计算机科学与技术系安徽合肥 230026

收稿日期:1900-01-01 修回日期:1900-01-01 发布日期:2007-03-15

Content-based clustered P2P search model depending on set distance

WANG Jing, ZHANG Huan-Jie, YANG Shou-Bao, GAO Ying

Department of Computer Science and Technology, University of Science and Technology of China, Hefei, 230026

Received:1900-01-01 Revised:1900-01-01 Published:2007-03-15

摘要/Abstract

摘要： 基于内容的非结构化P2P搜索系统中直接影响查询效果和搜索成本的两个主要问题是,高维语义空间所引起的文本相似度计算复杂以及广播算法带来的大量冗余消息. 本文提出利用集合差异度实现基于内容聚类的P2P搜索模型提高查询效率和减少冗余消息。该模型利用集合差异度定义文本相似度,将文本相似性的计算复杂度控制在线性时间内而有效地减少了查询时间;利用节点之间的集合差异度实现基于内容的聚类,既降低了查询时间,又减少了冗余消息.模拟实验表明,利用集合差异度构建的基于内容的搜索模型不仅具有较高的召回率,而且将搜索成本和查询时间分别降低到了Gnutella系统的40%和30%左右.

关键词: P2P, Gnutella, DHT, 集合差异度, 向量空间模型

Abstract: In content-based unstructured P2P search system, the main issues that affect query efficiency and searching cost are the complexity of computing document similarity brought by high dimensions and the great deal of redundant messages coming with flooding. Content-based cluster P2P search model depending on set distance is proposed in this paper to reduce the query time and redundant messages. This model defines document similarity by set distance to restrain the complexity of computing the document similarity in linear time. Also, clustering peers based on content depending on set distance reduces the query time and decreases the redundant messages. Simulations show that this model not only has higher recall, but also reduces the search cost and query time to the rate of 40% and 30% of Gnutella.

Key words: Peer to Peer, Gnutella, Distributed Hash Tables, Set Distance, Vector Space Model

中图分类号:

TP393

王菁张焕杰杨寿保高鹰. 利用集合差异度实现基于内容聚类的P2P搜索模型[J]. 中国科学院大学学报, 2007, 24(2): 241-247.

WANG Jing, ZHANG Huan-Jie, YANG Shou-Bao, GAO Ying. Content-based clustered P2P search model depending on set distance[J]. , 2007, 24(2): 241-247.

[1]	叶浩, 薛开平, 洪佩琳, 卢汉成. 无线Ad Hoc网络中一种网络无关的 P2P流媒体优化传输方案[J]. 中国科学院大学学报, 2012, 29(4): 555-563.
[2]	路卫娜;，杨寿保，郭磊涛. P2P流媒体系统中积分检测相结合的激励机制[J]. 中国科学院大学学报, 2008, 25(1): 61-68.
[3]	韦冬，杨寿保，纪雯，路卫娜. 基于人工免疫的P2P文件共享防污染系统[J]. 中国科学院大学学报, 2007, 24(6): 794-800.
[4]	贾素平，张玉清. P2P网络的认证方法研究（英文）[J]. 中国科学院大学学报, 2007, 24(6): 820-828.

利用集合差异度实现基于内容聚类的P2P搜索模型

Content-based clustered P2P search model depending on set distance

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 4

编辑推荐

Metrics

本文评价

访问统计

联系我们