数据与计算发展前沿 ›› 2023, Vol. 5 ›› Issue (4): 127-138.

CSTR: 32002.14.jfdc.CN10-1649/TP.2023.04.011

doi: 10.11871/jfdc.issn.2096-742X.2023.04.011

• 技术与应用 • 上一篇    下一篇

基于K-BERT和残差循环单元的中文情感分析

王桂江(),黄润才*(),黄勃   

  1. 上海工程技术大学,电子电气工程学院,上海 201620
  • 收稿日期:2022-04-18 出版日期:2023-08-20 发布日期:2023-08-23
  • 通讯作者: *黄润才(E-mail: hrc@sues.edu.cn
  • 作者简介:王桂江,上海工程技术大学,硕士研究生,CCF会员,主要研究方向为机器学习、自然语言处理。
    本文负责项目实验、数据整理、论文写作与格式校正。
    WANG Guijiang, Shanghai Engineering University, master’s student, CCF member, his main research directions include machine learning, natural language proce-ssing.
    In this paper, he is responsible for project experiment, data collation, paper writing and format correction.
    E-mail: guijiang_wang@163.com|黄润才,上海工程技术大学,副教授,博士,主要研究方向为机器学习、大数据、自然语言处理。
    本文负责制定论文框架和实验指导。
    HUANG Runcai, Shanghai Engineering University, associate professor, Ph.D., his main research directions include machine learning, big data, natural language processing.
    In this paper, he is responsible for paper framework and experi-mental guidance.
    E-mail: hrc@sues.edu.cn
  • 基金资助:
    国家自然科学基金(61603242)

Chinese Sentiment Analysis Based on K-BERT and Residual Recurrent Units

WANG Guijiang(),HUANG Runcai*(),HUANG Bo   

  1. School of Electrical and Electronic Engineering, Shanghai Engineering University, Shanghai 201620, China
  • Received:2022-04-18 Online:2023-08-20 Published:2023-08-23

摘要:

【目的】利用自然语言处理技术可以为网络舆论安全提供技术支持。为解决文本情感分析中存在的循环神经网络无法获取深层加浅层的特征信息,以及动态词向量偏离核心语义的问题,本文提出了基于K-BERT和残差循环单元的K-BERT-BiRESRU-ATT的情感分析模型。【方法】首先使用K-BERT模型获取包含背景知识的语义特征向量;之后使用提出的双向残差简单循环单元(Bidirectional Residual Simple Recurrent Unit, BiRESRU),对上下文特征进行序列提取,获取深层和浅层的特征信息;然后利用注意力机制对BiRESRU的输出进行关键词权重增强;最后使用softmax进行结果分类。【结果】在ChnSentiCorp和weibo数据集上,分别达到了95.6%和98.25%的准确率;在计算速度上较使用其他循环网络每轮迭代减少了接近 5分钟,提高了计算效率。【结论】K-BERT-BiRESRU-ATT解决了动态词向量偏离核心语义的问题,获得了深层加浅层的特征信息,加速模型计算的同时也提高了分类准确率,但仍对计算能力有较大需求。

关键词: 简单循环单元, K-BERT, 情感分析, 网络舆论安全

Abstract:

[Objective] The use of natural language processing technology can provide technical support for the security of network public opinion. In order to solve the problem that the recurrent neural network in text sentiment analysis cannot obtain the feature information of deep and shallow layers, and the dynamic word vector deviates from the core semantics, a K-BERT-BiRESRU-ATT based on K-BERT and the residual recurrent unit is proposed. [Methods] First, the K-BERT model is used to obtain the semantic feature vector containing background knowledge; Then, the proposed Bidirectional Residual Simple Recurrent Unit (BiRESRU) is used to extract the sequence of the contextual features to obtain deep and shallow feature information; After that, the attention mechanism is used to enhance the keyword weight of the output of BiRESRU; Finally softmax is used to classify the results. [Results] On the ChnSentiCorp and Weibo datasets, the accuracy rates were 95.6% and 98.25%, respectively; the calculation time was reduced by nearly 5 minutes per iteration compared with other recurrent networks, and the computational efficiency was improved. [Conclusions] K-BERT-BiRESRU-ATT solves the problem of the dynamic word vector deviation from the core semantics, obtains the feature information of deep and shallow layers, accelerates the model calculation, and improves the classification accuracy. But it still has a large demand for computing ability.

Key words: simple recurrent unit, K-BERT, sentiment analysis, security of network public opinion