数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (2): 97-105.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.02.008

doi: 10.11871/jfdc.issn.2096-742X.2023.02.008

• 技术与应用 • 上一篇    下一篇

基于知识图谱和主题模型的短文本特征增强方法

许淞源1,2,李成赞1,刘峰1,*()   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100049
  • 收稿日期:2022-01-20 出版日期:2023-04-20 发布日期:2023-04-24
  • 通讯作者: 刘峰
  • 作者简介:许淞源,中国科学院计算机网络信息中心,硕士研究生,主要研究领域为自然语言处理、推荐系统等。
    在本文中负责数据处理、模型构建、实验和论文撰写。
    XU Songyuan is a graduate student in Computer Network Information Center of Chinese Academy of Sciences. His research interests cover natural language proce-ssing, recommendation system, etc.
    In this paper, he is responsible for data processing, model cons-truction, experiments design and, paper writing.
    E-mail: xusongyuan@cnic.cn|刘峰,中国科学院计算机网络信息中心,博士,项目研究员,长期从事科学数据管理与共享服务技术研究及平台建设。主要研究方向为数据融合管理与语义关联技术。
    本文中负责文章框架组织和重点内容修订。
    LIU Feng, Ph.D., is a project researcher at the Computer Net-work Information Center of Chinese Academy of Sciences. He has long been engaged in scientific data management, sharing service technology research, and platform construction. His main research directions are data fusion management and semantic association technology.
    In this paper, he is mainly responsible for the organization of the article framework and the revision of key content.
    E-mail: liufeng@cnic.cn

Feature Enhancement Method for Short Text Based on Knowledge Graph and Topic Model

XU Songyuan1,2,LI Chengzan1,LIU Feng1,*()   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-01-20 Online:2023-04-20 Published:2023-04-24
  • Contact: LIU Feng

摘要:

【目的】中文短文本具有特征稀疏的问题,构建高质量的短文本特征表示将对文本的分类、推荐等处理具有重要意义。【方法】针对这一问题,本文提出了一种基于知识图谱和主题模型的短文本特征增强模型,借助知识图谱获取外部知识对短文本进行特征扩展,使用主题模型对短文本进行语义挖掘,最后通过向量拼接生成短文本特征增强向量。【结论】本文将提出的方法应用到中文短文本分类任务中,并进行了对比实验,实验结果证明本文提出的方法能够更好地对短文本进行特征表示。

关键词: 文本特征扩展, 主题模型, 短文本分类, 知识图谱

Abstract:

[Objective] Chinese short text has the problem of feature sparsity, constructing high-quality short text feature representation will be of great significance to the short text classification and short text recommendation, etc. [Methods] To solve this problem, this paper proposes a feature enhancement method for short text based on knowledge graph and topic model. The proposed method uses the knowledge graph to obtain external knowledge for short text feature expansion and uses the topic model to mine the semantic feature in the short text. Finally, the feature-enhanced vector is generated through vector concatenation. [Conclusions] This paper applies the proposed method to the Chinese short text classification task. The comparative experiment results show that the proposed method can better represent short texts.

Key words: text feature enhancement, topic model, short text classification, knowledge graph