数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (6): 50-59.

doi: 10.11871/jfdc.issn.2096-742X.2021.06.004

• 专刊:科学大数据挖掘与知识发现 • 上一篇    下一篇

基于词嵌入语义异常的跨学科研究内容发现方法

何涛1,*(),王桂芳2(),马廷灿2()   

  1. 1. 海军工程大学信息安全系,湖北 武汉 430033
    2. 中国科学院武汉文献情报中心,湖北 武汉 430071
  • 收稿日期:2021-10-26 出版日期:2021-12-20 发布日期:2022-01-26
  • 通讯作者: 何涛
  • 作者简介:何涛, 海军工程大学信息安全系,博士,副研究馆员,主要研究方向为自然语言处理、科学计量。
    本文中负责提出思路、实验设计。
    HE Tao, Ph.D, is an associate rese-arch librarian of Department of Infor-mation Security, Naval University of Engineering. His main research interests include natural language processing and scientometrics.
    In this paper, he is responsible for the design of method and experiment. E-mail: taohe@whu.edu.cn;|王桂芳,中国科学院武汉文献情报中心,博士,副研究馆员,主要研究方向为学科领域技术机会识别。
    本文中负责数据分析。
    WANG Guifang, Ph.D, is an associate research librarian at Wuhan Library, Chi-nese Academy of Sciences. Her current research interests inclu-de identification of technical opportunities in disciplines.
    In this paper, she is responsible for data analysis. E-mail: conawang0206@gmail.com;|马廷灿,中国科学院武汉文献情报中心,硕士生导师,研究馆员,主要研究方向为科学计量学与应用。
    本文中负责研究指导。
    MA Tingcan is a research librarian and master supervisor at Wuhan Library, Chinese Academy of Sciences. Hiscurrent research interests include scientometrics and informetrics.
    He is responsible for the research guidance. E-mail: matc@whlib.ac.cn
  • 基金资助:
    中国科学院人才项目“青年创新促进会”(2016160)

Discovering Interdisciplinary Research Based on Word Embedding

HE Tao1,*(),WANG Guifang2(),MA Tingcan2()   

  1. 1. Department of Information Security, Naval University of Engineering, Wuhan, Hubei 430033, China
    2. Wuhan Library, Chinese Academy of Sciences, Wuhan, Hubei 430071, China
  • Received:2021-10-26 Online:2021-12-20 Published:2022-01-26
  • Contact: HE Tao

摘要:

【目的】跨学科的研究内容推动了科学重大发现的产生,科研人员需要了解其研究方向中所出现的跨学科研究内容。科学文献规模变得越来越庞大,采用人工阅读的方式从科学文献中寻找跨学科的研究内容变得越来越困难,需要计算机辅助科研人员对跨学科研究内容进行揭示。【方法】本文提出了一种跨学科研究内容的自动识别方法,此方法利用人工智能中词嵌入语义分布的特性,在构建的约170万自然科学常见词汇的词嵌入的基础上,通过让计算机从论文的作者关键词中自动识别出语义异常的词汇来发现跨学科的研究内容。【结果】将该方法应用于深度学习研究方向,挖掘出了若干跨学科的自然科学研究内容。【局限】由于传统词嵌入在表示一词多意上的缺陷,语义异常作者关键词识别的准确性有待提升。【结论】本文的方法为跨学科研究内容的发现提供了一种新的解决思路。

关键词: 跨学科研究, 词嵌入, 语义异常, 深度学习

Abstract:

[Objective] Interdisciplinary research has promoted the emergence of many major scientific discoveries, hence researchers need to identify interdisciplinary problems in their research field. Since huge number of scientific papers have been published today, it is difficult to discover interdisciplinary research manually, automatic searching methods are needed.[Methods] This paper proposes an approach to discover interdisciplinary research automatically. This method adopts the word embedding mechanism in artificial intelligence, which covers approximately 1.7 million vocabularies of natural sciences. It captures interdisciplinary research by automatically identifying the keywords with semantic anomalies from the author's keywords. [Results] The method is applied to the research field of deep learning and discovers some interdisciplinary natural science research. [L-imitations] Due to the shortcomings of traditional word embedding in representing the word with multiple meanings, the recognition accuracy of keywords with semantic anomalies needs to be improved. [Conclusions] The proposed method is a novel solution to the discovery of interdisciplinary research.

Key words: interdisciplinary research, word embedding, semantic anomaly, deep learning