Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (5): 107-118.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.05.009

doi: 10.11871/jfdc.issn.2096-742X.2023.05.009

• Technology and Application • Previous Articles     Next Articles

Research on Topic Recognition and Analysis Based on LDA and Move Tagging

ZHANG Hui(),CHUAN Limin*(),ZHENG Huaiguo,ZHAO Jingjuan,QI Shijie   

  1. Institute of Data Science and Agricultural Economics, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2022-09-22 Online:2023-10-20 Published:2023-10-31

Abstract:

[Objective] From the two dimensions of topic representation word extraction and topic sentence function classification, this paper demonstrates a new topic analysis method based on Latent Dirichlet Allocation (LDA) model and move tagging, and explores the effectiveness and practicality of the method. [Methods] LDA model is used to identify the topic, and the Sentence Transformer model is used to extract the subject phrases. Meanwhile, a sentence function classification model is constructed to annotate the steps, identify the functional types of text sentences, and analyze the topic content from the perspective of sentence function. [Results] Based on the data of papers in the field of agricultural resources and environment, the empirical study shows that, compared with the traditional LDA model, the identified subject characterizing words are more readable and explanatory, and further combined with the step annotation, the content analysis of the subject sentence is more in-depth. [Limitations] There is a problem that the extended content of the subject phrase token words are of the same meaning. It is necessary to further improve by integrating the subject phrase token words with the same meaning. [Conclusions] The proposed method in this study achieves a good effect on topic representation word extraction and topic content analysis, which can improve the efficiency and depth of text topic mining analysis.

Key words: LDA model, move tagging, subject phrase, subject analysis