Frontiers of Data and Computing ›› 2023, Vol. 5 ›› Issue (3): 92-110.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.03.007

doi: 10.11871/jfdc.issn.2096-742X.2023.03.007

• Technology and Application • Previous Articles     Next Articles

Progress in Research and Application of Text Embedding Technology

ZHAO Yueyang1,*(),CUI Lei2   

  1. 1. Library of Shengjing Hospital, China Medical University, Shenyang, Liaoning 110004, China
    2. Health Management School, China Medical University, Shenyang, Liaoning 110122, China
  • Received:2022-02-21 Online:2023-06-20 Published:2023-06-21

Abstract:

[Objective] This article conducts an in-depth analysis and comparison of the research on text embedding and describes the basic model of text embedding and the model improvement methods for different fields and different data sets. Popular embedding models are discussed and the advantages and disadvantages of the models are compared. [Methods] The relevant documents of text embedding research at home and abroad are obtained from the Web of Science database、CNKI database and WanFang database and the text embedding technologies, improvement schemes, and modeling ideas are systematically analyzed. [Results] After deduplication and merging, 61 documents with the most relevant content are retained. Text embedding methods can be summarized into three categories: text embedding based on frequency, text embedding based on neural network, and text embedding based on topic modeling. Given the challenges faced by text embeddings such as the size of the corpus, polysemous word embedding, and universal embedding domain adaptation, possible solutions are extrated from the research articles under investigation.

Key words: text embedding, natural language processing, content analysis