欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (2): 240-251.DOI: 10.7523/j.ucas.2020.0026

• 电子信息与计算机科学 • 上一篇    下一篇

一种新的基于深度学习的重叠关系联合抽取模型(英文)

赵敏钧1, 赵亚伟1, 赵雅捷2, 罗刚2   

  1. 1 中国科学院大学 工程科学学院, 北京 100049;
    2 北京知因智慧科技有限公司AI实验室, 北京 100088
  • 收稿日期:2020-03-23 修回日期:2020-05-25 发布日期:2021-06-01
  • 通讯作者: 赵亚伟
  • 基金资助:
    Supported by the National Natural Science Foundation of China (61872331) and University of Chinese Academy of Sciences

A new joint model for extracting overlapping relations based on deep learning

ZHAO Minjun1, ZHAO Yawei1, ZHAO Yajie2, LUO Gang2   

  1. 1 School of Engineering Science, University of Chinese Academy of Sciences, Beijing 100049, China;
    2 AI Lab of KnowLeGene Intelligent Technology Co, Ltd, Beijing 100088, China
  • Received:2020-03-23 Revised:2020-05-25 Published:2021-06-01
  • Supported by:
    Supported by the National Natural Science Foundation of China (61872331) and University of Chinese Academy of Sciences

摘要: 随着互联网技术的快速发展和移动设备的普及,我们每时每刻都被各种各样的信息包围着。如何从海量的数据中挖掘出具有价值的信息一直是国内外研究的热点。其中,关系抽取是信息抽取的一个重要子任务,目的是从文本中识别出实体之间的关系,从而挖掘出文本中的结构化信息,即事实三元组。在文本中,实体重叠和关系重叠是非常普遍的现象,但是现有的联合抽取模型不能够有效地解决这类问题,因此提出一种新的联合抽取模型,将关系抽取任务看作由2个子任务实体识别和关系识别组成,并分别使用序列标注的方法和多分类方法进行识别。在联合抽取过程中,为充分挖掘文本语义信息,在模型的输入层添加词性(POS)和句法依存关系(Deprel)特征,同时为消除随着句子长度增加带来的长距离依赖问题,在模型中引入注意力机制。最后,论文在NYT数据集和WebNLG数据集上进行关系抽取实验,结果表明论文提出的模型能够有效地解决关系重叠的问题,并取得最佳抽取效果。

关键词: 关系抽取, 实体重叠, 联合抽取模型, 深度学习

Abstract: With the rapid developments of Internet technologies and popularization of Internet among daily activities, we are surrounded by all kinds of information every moment. Hence, to mine valuable information from massive data has always been a hotspot of research at home and abroad. In this environment, relationship extraction is an important subtask of information extraction, which purpose is to identify the relationship between entities from the text, so as to mine the structured information in the text, that is, fact triplet. In the text, entity overlapping and relationship overlapping are very common phenomena, but the existing joint extraction model cannot effectively solve such problems, so the paper proposes a new joint extraction model, which regards the relationship extraction task as consisting of entity recognition and relationship recognition of two subtasks. The two subtasks are identified using sequence labeling method and multi-classification method, respectively. In the joint extraction process, in order to fully mine the semantic information of the text, the part of speech (POS) and syntactic dependency (Deprel) features were added to the input layer of the model. Attention mechanism is also introduced in the model, which can eliminate the problem of long-distance dependence as sentence length increases. Finally, the paper conducts relationship extraction experiments on the NYT dataset and the WebNLG dataset. The experimental results show that the model proposed in the paper can effectively solve the problem of overlapping relationships and obtain the best extraction effect.

Key words: relation extraction, entity overlapped, joint extraction model, deep learning

中图分类号: