Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (4): 86-100.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.04.008

doi: 10.11871/jfdc.issn.2096-742X.2023.04.008

• Technology and Application • Previous Articles     Next Articles

Review of Automatic Citation Classification Based on Deep Learning Technology

LI JunFei1,2(),XU LiMing1,2,WANG Yang1,2,*(),WEI Xin1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-01-20 Online:2023-08-20 Published:2023-08-23

Abstract:

[Objective] The citation classification of scientific and technological literature is the basic work of academic influence evaluation and literature retrieval and recommendation. With the development of deep neural networks and pre-trained language models, the research on citation classification of scientific and technological literature has achieved great success. Many citation classification models, data sets, and methods for scientific and technological documents based on deep learning technology have been proposed in the literature. However, there is still a lack of comprehensive research on existing methods and the latest trends. This paper makes up for this gap. [Methods] This paper studies the citation classification model and data set of scientific and technological literature based on deep learning technology, compares and analyzes the performance of different models as well as their advantages and disadvantages, summarizes the citation classification technology for scientific and technological literacy, and discusses the future development direction. [Results] The classification model based on the pre-trained language model can effectively learn the global semantic representation, improve the problems of low training efficiency of RNNs (Recurrent Neural Networks) and limited length of dependent features of text sequences extracted by CNNs (Convolutional Neural Networks), and significantly improve the classification accuracy. [Limitations] This paper mainly introduces the progress of citation classification technology in scientific and technological literature, and does not comprehensively predict the development direction of technology in the future.

Key words: citation classification of scientific and technological documents, pre-trained language model, deep learning, natural language processing