欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (3): 302-308.DOI: 10.7523/j.ucas.2020.0038

• 数学 • 上一篇    下一篇

精度矩阵置信区间在高维网络数据中的研究

郑泽敏, 周慧婷   

  1. 中国科学技术大学管理学院统计与金融系, 合肥 230026
  • 收稿日期:2020-04-15 修回日期:2020-06-28 发布日期:2021-06-01
  • 通讯作者: 周慧婷
  • 基金资助:
    国家自然科学基金(1601501,11671374,71731010)资助

Research of confidence intervals for precision matrix in high dimensional network data

ZHENG Zemin, ZHOU Huiting   

  1. Department of Statistics and Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
  • Received:2020-04-15 Revised:2020-06-28 Published:2021-06-01

摘要: 随着互联网与科学技术的发展, 大数据以前所未有的规模激增, 不同个体之间形成了错综复杂的网络数据。图模型精度(逆协方差)矩阵的置信区间对恢复网络间联系起到了非常重要的作用。如何快速得到精度矩阵的置信区间是一个亟待解决的问题。提出De-ISEE(De-innovated scalable efficient estimation)统计量,基于其构造的置信区间在保持较大理想覆盖率的同时,计算效率也得到了较大的提升。仿真实验充分展示了该方法在网络数据中覆盖率和计算方面的优势。将De-ISEE方法应用到核黄素数据以及基因表达数据,发现De-ISEE方法可作为研究基因联系的一个重要工具。

关键词: 网络数据, 高维图模型, 置信区间, 精度矩阵, 去偏统计量

Abstract: With the development of the Internet, science, and technology, the surge of bigdata on an unprecedented scale has brought complex network data between different individuals. It is practical significance to uncover the network connection by studying the confidence intervals of the precision (inverse covariance) matrix in graphical models. One natural and important question is how to efficiently obtain confidence intervals of the precision matrix. This paper proposes the De-ISEE (De-innovated scalable efficient estimation) statistic, whose confidence intervals enjoy efficient computation while maintaining a desirable coverage rate. Both average coverage and computational advantages of the methods have been demonstrated by our numerical studies in network data. Moreover, this paper applies the De-ISEE method to riboflavin data and gene expression data, and finds that De-ISEE method could be an important tool for studying gene association.

Key words: network data, high-dimensional graphical models, confidence intervals, precision matrix, De-sparsified statistic

中图分类号: