欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2021, Vol. 38 ›› Issue (2): 280-287.DOI: 10.7523/j.issn.2095-6134.2021.02.014

• 简报 • 上一篇    

基于改进神经过程的缺失数据填充算法

孙晓丽, 郭艳, 李宁, 宋晓祥   

  1. 中国人民解放军陆军工程大学, 南京 210007
  • 收稿日期:2019-07-08 修回日期:2019-10-08 发布日期:2021-03-15
  • 通讯作者: 郭艳
  • 基金资助:
    国家自然科学基金(61871400)和江苏省自然科学基金(BK20171401)资助

Missing data imputing algorithm based on modified neural process

SUN Xiaoli, GUO Yan, LI Ning, SONG Xiaoxiang   

  1. PLA Army Engineering University, Nanjing 210007, China
  • Received:2019-07-08 Revised:2019-10-08 Published:2021-03-15

摘要: 缺失数据填充是数据分析处理领域的一个重要研究课题。特别是在采集数据量较少的情况下,缺失数据填充的难度极大。针对这个问题,提出一种基于改进神经过程模型的缺失数据填充算法,该算法可有效提升小数据集背景下的缺失数据填充性能。首先,将观测到的时间序列进行单一表示,由神经网络得到各自的表征向量;其次,通过神经过程模型获得数据的分布函数,并在训练阶段引入修正系数α,从而根据数据缺失率更加精确地确定训练数据的采样率;最后,加入填充过程,通过训练好的模型估计数据缺失值。为检验算法性能,在海洋表面温度数据集以及北京PM2.5含量数据集上进行仿真实验,结果表明该算法在小数据集背景下具有良好的填充效果。与其他算法相比,所提算法在高缺失率的情况下具有更低的均方根误差。

关键词: 缺失数据填充, 时间序列, 改进神经过程, 修正系数

Abstract: Missing data imputing is a serious problem in the field of data analysis and process, which is extremely intractable in the case of the small dataset especially. In view of this problem, a missing data imputing algorithm based on modified neural process is proposed, which can improve the imputing performance in the background of the small dataset. Firstly, the observed time series is single-represented and then obtain the symptomatic vector respectively through the neural network. Secondly, it can acquire the distribution function of the data via the neural process and introduce the correction coefficient α to determine the sampling rate more exactly based on missing rate in the training stage. Finally, it imported the imputing process and estimated the missing data via trained model. Experiments are carried out on the sea surface temperature dataset and the Beijing PM2.5 dataset to verify the performance of the algorithm. The experiments show that the algorithm has an excellent performance in the context of small datasets, and it has a lower root mean square error compared with other algorithms.

Key words: missing data imputing, time series, modified neural process, correction coefficient

中图分类号: