欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2014, Vol. 31 ›› Issue (1): 124-129.DOI: 10.7523/j.issn.2095-6134.2014.01.018

• 计算机科学 • 上一篇    下一篇

基于Map/Reduce并行编程模型的XBRL维度数据解析算法

朱健鹏, 王颖, 杨诚   

  1. 中国科学院大学工程管理与信息技术学院, 北京 100049
  • 收稿日期:2013-04-26 修回日期:2013-05-20 发布日期:2014-01-15
  • 通讯作者: 朱健鹏,E-mail:zhujianpeng@ucas.ac.cn
  • 基金资助:

    国家自然科学基金(61303155)资助

An XBRL dimensional data parsing algorithm based on the Map/Reduce parallel programming model

ZHU Jianpeng, WANG Ying, YANG Cheng   

  1. College of Engineering and Information Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2013-04-26 Revised:2013-05-20 Published:2014-01-15

摘要:

从XBRL维度数据处理的角度,研究大规模半结构化数据处理技术,提出一种基于Map/Reduce并行编程模型的XBRL维度数据解析算法. 该算法在Map/Reduce编程模型和StAX流式解析技术的基础上,针对XBRL财务报告中各XML文件之间较复杂的数据引用关系,以整份XBRL财务报告为处理的最小单位,结合并行技术提取维度事项所包含的数据,再处理业务语义数据,从而实现复杂XBRL维度数据的解析. 性能比较分析表明,该算法在大规模XBRL数据处理方面具有显著优势.

关键词: XBRL, 半结构化数据处理, 大数据处理, Map/Reduce, XBRL维度

Abstract:

This article intends to study mass semi-structured data processing technology from XBRL dimensional data processing perspective. A new XBRL dimensional data parsing algorithm is proposed based on the Map/Reduce parallel programming model and StAX stream parsing technique. The algorithm specifically targets the analysis of complex data reference relationships among XML files in the XBRL financial report. In order to parse complex XBRL dimensional data, the algorithm uses a single XBRL financial report as the minimum processing unit. First, the data are extracted from the dimensional fact items, and then the business semantic data are processed. In experimental tests, the proposed algorithm presents an obvious advantage in large-scale XBRL data processing.

Key words: XBRL, semi-structured data processing, big data processing, Map/Reduce, XBRL dimension

中图分类号: