数据与计算发展前沿 ›› 2021, Vol. 3 ›› Issue (6): 60-80.

doi: 10.11871/jfdc.10-1649.2021.06.005

• 专刊:科学大数据挖掘与知识发现 • 上一篇    下一篇

科技文献挖掘工具平台与关键技术综述

白如江*(),赵梦梦(),张玉洁(),董坤()   

  1. 山东理工大学,信息管理研究院,山东 淄博 255000
  • 收稿日期:2021-11-16 出版日期:2021-12-20 发布日期:2022-01-26
  • 通讯作者: 白如江*
  • 作者简介:白如江,山东理工大学信息管理研究院,教授,博士,情报学硕士研究生导师。山东省高等学校青创人才引育计划“科技大数据研究创新团队”负责人,入选山东理工大学高层次人才“双百工程”第三层次。主持国家社科基金项目3项、中国博士后科学基金特别资助、中国博士后科学基金一等资助等项目,合作出版专著2部、发表论文70余篇,申请计算机软件著作权1项。主要研究领域为科技大数据挖掘、科技情报分析、智慧情报感知等。
    本文中负责总体研究思路确定、论文框架设计;负责撰写“1科技文献挖掘发展脉络”与“5总结与展望”;论文最终统稿修改。
    BAI Rujiang, Ph.D, professor of Institute of Information Mana-gement, Shandong University of Technology, master super-visor. He is the PI of “Science and Technology Big Data Resear-ch and Innovation Team” of Shandong Youth Talent Program, and selected as the third level of the “ShuangBai Project” of Shandong University of Technology. He has been granted 3 projects of the National Social Science Fund, special funding of the China Postdoctoral Science Fund, first-class funding of the China Postdoctoral Science Fund.
    He has published 2 monographs and more than 70 papers and has applied 1 computer software copyright cooperatively. His research interests include S&T big data mining, S&T intelligence analysis, and smart intelligence perception.In this paper, he is responsible for drawing up the paper framework, writing “1.the history roadmap of S&T literature mining” and “5 Conclusion and Prospect” and paper revision and approval.E-mail: brj@sdut.edu.cn;|赵梦梦,山东理工大学信息管理研究院,硕士研究生,研究方向为文本挖掘与科技情报分析。
    本文中负责“2 科技文献挖掘工具现状”和“3科技文献挖掘系统平台现状”的撰写。
    ZHAO Mengmeng, is a graduate student of Information Mana-gement Institute of Shandong University of technology. Her research interests include text mining and scientific and techno-logical information analysis.
    In this paper, she is responsible for writing “2 current situation of scientific and technological literature mining tools” and “3 The status of sci-tech document mining system platform ”. E-mail: zhaomeng199701@163.com;|张玉洁,山东理工大学信息管理研究院,硕士研究生,研究方向为文本挖掘与科技情报分析。
    本文中负责“4 科技文献挖掘关键技术及其发展趋势”的撰写。
    ZHANG Yujie is a graduate student of the Information Management Institute of Shandong University of Technology. His research interests include text mining and scientific and technological information analysis.
    In this paper, he is responsible for writing “4 key technologies and development trends of scientific and technological literature mining”. E-mail: zyj1725@163.com;|董坤,山东理工大学信息管理研究院,讲师,硕士生导师,研究方向为科技情报分析。
    本文中负责部分图表绘制与论文修改。
    DONG Kun is a lecturer and master tutor of the Information Management Institute of Shandong University of Technology. Her research interests include scientific and technological information analysis.
    In this paper, she is responsible for diagrams drawing and manuscript revising. E-mail: dongkun@sdut.edu.cn
  • 基金资助:
    国家社科基金一般项目“多源数据融合驱动的智慧情报感知研究”(21BTQ071)

Review on Scientific Literature Mining: Tools and Technologies

BAI Rujiang(),ZHAO Mengmeng(),ZHANG Yujie(),DONG Kun()   

  1. Institute of Information Management, Shandong University of Technology, Zibo, Shandong 255000, China
  • Received:2021-11-16 Online:2021-12-20 Published:2022-01-26
  • Contact: BAI Rujiang

摘要:

【目的】对科技文献挖掘的主要工具、系统平台和关键技术进行全面系统梳理,指出未来发展趋势,为相关研究提供参考。【方法】通过网络和文献调研等方法梳理科技文献挖掘的历史发展脉络,总结科技文献挖掘的主要工具、系统平台及其特点,从平台功能、数据类型、可视化功能等维度进行了对比分析,重点介绍科技文献挖掘的关键技术及其发展前沿。【结果】论文详细阐述了科技文献挖掘“从哪里挖、用什么工具挖、怎么挖”的问题,并指出科技文献挖掘的数据源逐步向多源数据融合和细粒度知识组织方向发展,科技文献语义知识图谱构建是目前研究的热点话题,图神经网络、预训练模型、对抗学习网络等深度学习模型是当前科技文献挖掘的前沿关键技术,因果推断方法正在逐步成为前沿方向。【结论】随着大数据、人工智能的持续深入发展,科技文献挖掘将借助数据和技术红利在科技情报决策等具体应用场景发挥更大价值。

关键词: 科技文献挖掘, 文献挖掘工具平台, 文献挖掘关键技术, 前沿进展

Abstract:

[Objective] By reviewing on the main tools, system platforms and key technologies of scientific and technological literature mining, this paper presents the future development trend and the reference to the relevant researches.[Methods] By literature review and network survey, this paper summarizes the historical roadmap of scientific and technological literature mining, introduces the main tools, system platforms, and characteristics of scientific and technological literature mining, makes a comparative analysis from the aspects of platform function, data type, and processing ability, and focuses on the key technologies and the state of art of the scientific and technological literature mining. [Results] This paper resolves the problem of "where to mine, what to mine, and how to mine" in scientific and technological literature mining. The data source of scientific and technological literature mining is evolving towards multi-source data fusion and fine-grained knowledge organization. The construction of semantic knowledge map of scientific and technological literature is a hot topic. The graph neural network, pre-trained model, generative adversarial networks are the frontier key technologies. The causal inference methods are becoming the frontier direction. [Conclusions] With the continuous in-depth development of big data and artificial intelligence, scientific and technological literature mining is of a great value in specific application scenarios such as scientific and technological information decision-making with the help of data and technological benefits.

Key words: scientific literature mining, literature mining tool platform, key technologies of literature mining, research frontiers