数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (6): 21-29.

doi: 10.11871/jfdc.issn.2096-742X.2020.06.003

• 专题:高性能计算在行业领域的特色应用 • 上一篇    下一篇

基于特征融合的微博短文本情感分类研究

陈涛,安俊秀()   

  1. 成都信息工程大学并行计算与大数据研究所,四川 成都 610225
  • 收稿日期:2020-07-23 出版日期:2020-12-20 发布日期:2020-12-29
  • 通讯作者: 安俊秀
  • 作者简介:陈涛,成都信息工程大学,硕士研究生,主要研究方向为大数据与并行计算、数据挖掘、自然语言处理。本文主要负责论文撰写、算法设计和实验验证。
    CHEN Tao is a MS student of Institute of Parallel Computing and Big Data, Chengdu University of Information Technology. His research interests include Big data and Parallel computing, data mining, and natural language processing.In this paper, he is responsible for the paper writing, algorithm design and experimental verification.E-mail: 1147281441@qq.com|安俊秀,成都信息工程大学,教授,成都信息工程大学并行计算与大数据研究所负责人,主要研究方向包括大数据与并行计算、数据挖掘、社会计算等。本文主要负责论文指导和国内外研究现状分析等。AN Junxiu is a professor of Chengdu University of Information Technology and the Director of Institute of parallel computing and big data. Her main research interests include big data and parallel computing, data mining, social computing, etc.In this paper, she is responsible for the paper guidance and literature review, etc.E-mail: anjunxiu@cuit.edu.cn
  • 基金资助:
    国家自然科学基金“社交网站使用对个体主观幸福感影响机制的研究”(71673032)

Sentiment Classification of Microblog Short Text Based on Feature Fusion

CHEN Tao,AN Junxiu()   

  1. Institute of Parallel Computing and Big Data, Chengdu University of Information Technology, Chengdu, Sichuan 610225, China
  • Received:2020-07-23 Online:2020-12-20 Published:2020-12-29
  • Contact: AN Junxiu

摘要:

【目的】随着信息技术和互联网的快速发展,微博等短文舆情的研究对网络舆情的研究十分重要,针对中文短文本信息量小、特征稀疏的特点,研究了微博短文本的情感分类,本文旨在提高微博短文本的特征提取能力以便于对网络舆情进行预测。【方法】为了更好地提取微博短文本的情感特征,本文首先利用BERT(Bidirectional Encoder Representation from Transformers)模型实现文本向量化,再利用卷积神经网络CNN(Convolution Neural Network)进行文本局部语义特征提取,最后将局部语义特征向量和BERT训练的特征向量进行特征融合。该方法有效地解决了短文本特征提取难的特点。【结果】实验结果表明,将该方法提取的文本向量带入到LSTM文本分类模型中,本文模型的分类准确率比BiLSTM+CNN+Attenion模型高出1.24%,比BiLSTM+Attenion模型高出3.22%,比LSTM+Attenion模型高出5.24%,比Text-CNN模型高出6.46%,比SVM高出8.47%。【结论】所提特征融合模型有效提升了文本情感分类的准确率。

关键词: 短文本, 情感分析, 神经网络, 特征融合

Abstract:

[Objective] With the rapid development of information technology and the Internet, the opinion research of microblog and other public short text is very important to study the network public opinion. Aiming at Chinese short text that represents only a small amount of information with sparse features, this paper studies the sentiment classification of microblog short text. The purpose of this paper is to improve the ability of feature extraction from microblog short text to facilitate the prediction of network public opinion. [Methods] For better motional feature extraction from microblog short text, this paper first uses the BERT model to realize the vectorization of the text, then uses CNN to extract the local semantic features of the text, and finally combines the local semantic feature vector and the feature vector trained by BERT. This method effectively solves the problem of feature extraction for Chinese short text. [Results] The experimental results show that the classification accuracy of this model is 1.24% higher than that of BiLSTM+CNN+Attenion model, 3.22% higher than BiLSTM+Attenion model, 5.24% higher than LSTM+Attenion model, 6.46% higher than Text-CNN model and 8.47% higher than SVM. [Conclusions] The proposed feature fusion model effectively improves the accuracy of text sentiment classification.

Key words: short text, sentiment analysis, neural network, feature fusion