欢迎访问中国科学院大学学报,今天是

中国科学院大学学报 ›› 2022, Vol. 39 ›› Issue (3): 360-368.DOI: 10.7523/j.ucas.2020.0019

• 电子信息与计算机科学 • 上一篇    下一篇

基于特征编码和图嵌入的姓名消歧方法

马莹莹1,2,3, 吴幼龙1, 唐华1,2,3   

  1. 1. 上海科技大学信息科学与技术学院, 上海 201210;
    2. 中国科学院上海微系统与信息技术研究所, 上海 200050;
    3. 中国科学院大学, 北京 100049
  • 收稿日期:2020-02-17 修回日期:2020-04-03 发布日期:2021-05-31
  • 通讯作者: 马莹莹
  • 基金资助:
    国家自然科学基金(61901267)资助

Name disambiguation based on encoding attributes and graph topology

MA Yingying1,2,3, WU Youlong1, TANG Hua1,2,3   

  1. 1 School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China;
    2 Shanghai Institute of Microsystem & Information Technology, Chinese Academy of Sciences, Shanghai 200050, China;
    3 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-02-17 Revised:2020-04-03 Published:2021-05-31

摘要: 针对作者姓名歧义问题,提出基于特征编码和图嵌入的作者姓名消歧方法。该方法首先利用word2vec模型对文档的属性特征进行编码从而构建文档的表征向量,然后采用图自动编码器将文档关系编码至文档向量中,聚类相似文档。为进一步提升聚类结果的准确性,使用图嵌入的方法将文档关系网络和作者关系网络的拓扑结构信息引入文档向量,进一步聚集相关文档。该方法同时利用文档的属性特征以及多个关系网络的信息,通过无监督学习的方法寻找文档表征向量,实现良好的姓名消歧效果。在真实作者数据集AMiner上的测试结果表明,该方法显著优于目前几个其他基于图网络的方法。

关键词: 姓名消歧, 图神经网络, 聚类方法, 特征提取, 图嵌入

Abstract: Aiming at solving the problem of author name ambiguity, we propose a novel name disambiguation method based on encoding attributes and graph topology. A word2vec model is used to construct document representation vectors by encoding the attributes of documents. The relationship of documents is then encoded into the document embedding vectors by a graph auto-encoder and similar documents are aggregated. To further improve the accuracy of the clustering results, a graph embedding model is proposed to introduce the document-document network and author-author network topology into the document vectors afterword, thus related papers are moved closer. This method utilizes the information of document attributes and relationship networks at the same time, finds document representation vectors using an unsupervised model and improves the performance of name disambiguation. Experimental results on the real author dataset AMiner show that our method is superior to several state-of-the-art graph-based solutions.

Key words: name disambiguation, graph neural network, clustering method, feature extraction, graph embedding

中图分类号: