数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (2): 103-111.

doi: 10.11871/jfdc.issn.2096-742X.2021.02.012

• 技术与应用 • 上一篇    下一篇

基于句子向量表示和模糊C均值的电子政务文档自动摘要技术

祁荣苓1,2(),焦文彬1,*(),汪洋1()   

  1. 1.中国科学院计算机网络信息中心, 北京 100190
    2.中国科学院大学, 北京 100049
  • 收稿日期:2020-10-23 出版日期:2021-04-20 发布日期:2021-05-18
  • 通讯作者: 焦文彬
  • 作者简介:祁荣苓,中国科学院计算机网络信息中心,中国科学院大学,硕士研究生,主要研究领域为自然语言处理和智能应用。
    本文负责算法实现、论文写作。
    QI Rongling is a master student of Com-puter Network Information Center, Chinese Academy of Sciences. Her research interests include natural language processing and intelligent applications.
    In this paper she undertakes the following tasks: algorithmic implementation and paper writing.
    E-mail: qirongling@cnic.cn|焦文彬,中国科学院计算机网络信息中心,硕士,正高级工程师,主要研究领域为电子政务及数据智能应用。
    本文负责论文组织、论文修改。
    JIAO Wenbin, master, is a Senior Engineer of Computer Network Information Center, Chinese Academy of Sciences. His research interests include E-Government and Data Intelligence Application.
    In this paper he undertakes the following tasks: organizing paper structure and paper revision.
    E-mail: wbjiao@cnic.cn|汪洋,中国科学院计算机网络信息中心,博士,高级工程师,主要研究领域为大数据分析与信息化战略研究。
    本文负责论文修改。
    WANG Yang, Ph.D., is a Senior Engineer of Computer Network Information Center, Chinese Academy of Sciences. His research interests include Big Data Analysis and Information Technology Strategy.
    In this paper he undertakes the following tasks: paper revision.
    E-mail: wangyang@cnic.cn
  • 基金资助:
    中国科学院信息化项目“智慧中科院建设推进工程——全院科研与教育态势感知服务”(XXH13504-03)

Automatic Summarization of e-Government Documents Based on Sentence Vector Representation and Fuzzy C-Means

QI Rongling1,2(),JIAO Wenbin1,*(),WANG Yang1()   

  1. 1. Computer network information center, Chinese Academy of Sciences, Beijing 100190,China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-10-23 Online:2021-04-20 Published:2021-05-18
  • Contact: JIAO Wenbin

摘要:

【目的】随着“互联网+电子政务”的发展,国家越来越重视我国电子信息化建设,对于政府相关决策者、管理者、信息化工作者及研究人员来说,迫切需要一种方式可以快速有效地获取众多的电子政务资讯来指导信息化评估和决策。本文旨在研究一种适合电子政务文档的自动摘要算法。【方法】本文针对电子政务资讯文本的特点提出了一种融合Doc2Vec句子向量表示方法和模糊均值聚类方法的算法并应用在电子政务资讯文档的自动摘要生成中,不仅考虑句子之间的相关度,而且针对文章的特点对于每个句子赋予一定的权重来表示他作为摘要句子的重要性。【结果】实验表明,相较于目前常用的k-means算法结果和复杂的深度学习算法结果,该算法在电子政务资讯文档的自动生成取得了比较好的结果。【结论】研究自动摘要技术并在电子政务领域应用是一项很有价值的工作。

关键词: 自动摘要, 电子政务, Doc2Vec, 模糊聚类, 信息化评估

Abstract:

[Objective] With the development of "Internet + E-Government", more and more attention has been paid to the construction of electronic information technology in China. For government decision-makers, managers, information workers and researchers, there is an urgent need to quickly and effectively obtain plenty of E-Government information to guide information evaluation and decision-making. This paper studies an automatic summarization algorithm for e-government documents. [Methods] According to the characteristics of e-government information text, this paper proposes an algorithm that uses Doc2Vec sentence vector representation and fuzzy c-means to automatically generate the summary of e-government information documents. It not only considers the correlation between sentences, but also gives weight to each sentence to express its importance as a summary sentence according to the characteristics of the article. [Results] Experiments show that, compared with the commonly used k-means algorithm and complex deep learning algorithms, this algorithm achieves better results in automatic generation of e-government information documents. [Conclusions] The proposed algorithm is effective for automatic document digest in the field of e-government.

Key words: automatic abstract, e-government, Doc2Vec, fuzzy c-means algorithm, informatization evaluation