欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2016, Vol. 33 ›› Issue (5): 711-719.DOI: 10.7523/j.issn.2095-6134.2016.05.020

• 简报 • 上一篇    

基于LSI的日地空间领域科学数据语义检索模型

刘春蔚1,2, 邹自明1, 佟继周1   

  1. 1 中国科学院国家空间科学中心, 北京 100190;
    2 中国科学院大学, 北京 100049
  • 收稿日期:2016-01-07 修回日期:2016-04-01 发布日期:2016-09-15
  • 通讯作者: 佟继周
  • 基金资助:

    中国科学院信息化建设专项(XXH12504-08)和中国科学院战略性先导科技专项(XDA04080000)资助

LSI-based semantic retrieval model for scientific data in solar-terrestrial space field

LIU Chunwei1,2, ZOU Ziming1, TONG Jizhou1   

  1. 1 National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China;
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2016-01-07 Revised:2016-04-01 Published:2016-09-15

摘要:

日地空间系统科学的数据具有体量大、种类多、结构复杂的特征,不同概念、不同事件之间的相互关联为该领域内的科学数据检索提出了很高的要求.然而目前该领域内依然以基于传统的关键词检索技术为主,严重影响检索结果的质量?提出一种数据语义检索模型,它是在对日地空间学科元信息提取的基础上,使用文本处理的方法将提取信息转换为词项-文档矩阵,进一步使用潜在语义索引技术对其进行分析,计算出检索条目与不同数据集的语义相关度,从而根据语义相关度向用户推荐科学数据.实验对比表明,该模型的召回率明显优于传统方法,且具有很高的准确率.该模型同时支持对科学数据进行语义标注和关键词提取,亦可用于其他领域科学数据检索.

关键词: 日地空间, 科学数据, 语义检索, 浅层语义索引, 元数据

Abstract:

The scientific data of solar-terrestrial space science has huge volume, wide variety, and complex structure. The correlations between different domain concepts and astro-events put forward high requirements of the scientific data retrieval in this field. However, the scientific data retrieval modules on the mainstream data share and publishing systems in this field are still built on the conventional keyword-based retrieval method. We present a semantic retrieval approach for the solar-terrestrial space system scientific data. Based on the semantic information extracted from scientific metadata of each scientific dataset, we get the TF-idf matrix using traditional text processing methods. Then latent semantic indexing further analyzes this matrix, and a similarity value is obtained to rank the relevance of a result to its search request. The experimental results show that the approach has a higher recall rate than conventional methods and maintains a high precision. This approach can be applied in other disciplines as well.

Key words: solar-terrestrial space, scientific data, semantic retrieval, LSI, metadata

中图分类号: