数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (6): 118-128.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.06.011

doi: 10.11871/jfdc.issn.2096-742X.2022.06.011

• 技术与应用 • 上一篇    下一篇

一种改进的融合文本主题特征的情感分析模型

张帅(),黄勃*(),巨家骥   

  1. 上海工程技术大学,电子电气工程学院,上海 201620
  • 收稿日期:2021-12-20 出版日期:2022-12-20 发布日期:2022-12-20
  • 通讯作者: 黄勃
  • 作者简介:张帅,上海工程技术大学,电子电气工程学院,硕士研究生,主要研究方向为自然语言处理、情感分析、文本分类。
    本文中负责提出研究思路,对设计模型进行实验,论文撰写。
    ZHANG Shuai is a postgraduate student of the School of Ele-ctrical and Electronic Engineering, Shanghai University of Engineering and Technology. His main research interests are natural language processing, sentiment analysis, and text classification.
    In this paper, he is responsible for proposing research ideas, conducting experiments on the designed model, and writing the paper.
    E-mail: 854400656@qq.com|黄勃,上海工程技术大学,电子电气工程学院,副教授,硕士生导师,主要研究方向为软件工程、人工智能、大数据、自然语言处理等。
    本文中负责设计研究方案和框架,论文最终版本修订。
    HUANG Bo is currently an associate professor in the School of Electronic and Electrical Engineering, Shanghai University of Engineering Science. His main research interests include software engineering, artificial intelligence, big data, and natural language processing.
    In this paper, he is responsible for the design of the research protocol and framework, the overall synthesis, and the revision of the final version of the paper.
    E-mail: huangbosues@sues.edu.cn
  • 基金资助:
    国家重点研发计划(2020AAA0109300);上海市信息安全综合管理技术重点实验室开放项目(AGK2019004)

An Improved Sentiment Analysis Model Incorporating Textual Topic Features

ZHANG Shuai(),HUANG Bo*(),JU Jiaji   

  1. School of Electrical and Electronic Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
  • Received:2021-12-20 Online:2022-12-20 Published:2022-12-20
  • Contact: HUANG Bo

摘要:

【目的】海量的用户评论对消费者和相关企业具有很大价值,针对评论信息长度过短导致的数据稀疏,主题不明确及分类准确率不高等问题。【方法】本文提出了一种融合主题特征的Bi-LSTM自注意力机制在线评论情感分析模型(TSC-BiLSTM)。与传统LSTM方法相比,该方法利用潜在狄利克雷分布(LDA)主题模型获得评论的主题词分布,与评论词向量拼接作为输入,通过Bi-LSTM挖掘全文特征信息,结合self-attention机制动态分配权重。【结果】本模型扩充了原短评论文本的特征空间,降低了数据的稀疏性,明确主题且提高了情感分类的准确性。【结论】在酒店和某外卖平台评论数据集上的实验表明,与相关模型比较,所提出的方法具有更好的性能,为主题情感分析方法提供了一种新的思路。

关键词: LDA, Bi-LSTM, self-attention, 情感分析

Abstract:

[Objective] Massive user reviews are of great value to consumers and related enterprises. This paper addresses the problems of sparse data, unclear topics, and poor classification accuracy caused by the short length of review information. [Methods] This paper proposes a Bi-LSTM self-attention mechanism online review sentiment analysis model (TSC-BiLSTM) incorporating topic features. Compared with the traditional LSTM method, this method uses the Latent Dirichlet Allocation (LDA) topic model to obtain the topic word distribution of comments, stitches it with the comment word vector as input, mines the full-text feature information through Bi-LSTM, and combines with self-attention mechanism to dynamically assign weights. [Results] This model expands the feature space of the original short review text, reduces the sparsity of the data, clarifies the topic, and improves the accuracy of sentiment classification. [Conclusions] Experiments on review datasets of a hotel and a takeaway platform show that the proposed method achieves better performances compared with other related models. It provides a novel view of topic sentiment analysis methods.

Key words: LDA, topic words, Bi-LSTM, self-attention, sentiment analysis