欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2008, Vol. 26 ›› Issue (6): 771-780.DOI: 10.7523/j.issn.2095-6134.2008.6.008

• 论文 • 上一篇    下一篇

保护私有信息的统计量化规则挖掘

荆巍巍,黄刘生,姚亦飞,徐维江   

  1. 中国科学技术大学计算机科学与技术系,合肥 230027;
    国家高性能计算中心(合肥),合肥 230027
  • 收稿日期:1900-01-01 修回日期:1900-01-01 发布日期:2008-11-15

Privacy-preserving statistical quantitative rules mining

Jing Wei-Wei , Huang Liu-Sheng, Yao Yi-Fei, Xu Wei-Jiang   

  1. Department of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China; National High Performance Computing Center at Hefei, Hefei 230027, China
  • Received:1900-01-01 Revised:1900-01-01 Published:2008-11-15

摘要: 统计量化规则(SQ rule)在数据挖掘中拥有重要和有用的地位。尽管集中式挖掘SQ规则的算法已经存在,但是集中式算法不能简单应用到分布式环境中,尤其涉及到分布式环境中各方的私有信息保护的时候。考虑数据分布共享的多方,在不泄漏各自的私有信息的情况下,合作完成SQ规则的挖掘问题。该问题属于保护私有信息的数据挖掘(PPDM)研究领域的问题。基于3个PPDM的基本工具,包括安全求和、安全求平均和安全求频繁项集的集合等,提交2个算法,共同完成水平划分数据下的保护私有信息的SQ规则挖掘。其中,一个算法安全计算置信区间,该区间用来检验规则的重要性;另一个算法安全挖掘规则。最后,给出算法的正确性、安全性和复杂性分析。

关键词: 安全多方计算, 保护私有信息的数据挖掘, 统计量化规则

Abstract: Statistical Quantitative (SQ) rule plays an important and useful role in data mining. Centralized algorithms have been presented for SQ rules mining. However, the algorithms cannot be easily applied to mining SQ rules on distributed data, where privacy of parties becomes great concerns. This paper considers the problem of mining SQ rules without revealing the private information of parties who compute jointly and share distributed data. The issue is an area of Privacy-Preserving Data Mining (PPDM) research. Based on several basic tools for PPDM, including secure sum, secure mean and secure frequent itemsets, this paper presents two algorithms to accomplish privacy-preserving SQ rules mining over horizontally partitioned data. One is to securely compute confidence intervals for testing the significance of rules; the other is to securely discover SQ rules. Besides, the analysis of the correctness, the security and the complexity of our algorithms are provided.

Key words: secure multi-party computation, privacy-preserving data mining, statistical quantitative rules