数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (4): 106-115.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.009

doi: 10.11871/jfdc.issn.2096-742X.2024.04.009

• 专刊:面向国家科学数据中心的基础软件栈及系统 • 上一篇    下一篇

基于深度学习的农业科技政策知识抽取方法研究

赵小丹(),胡林*()   

  1. 中国农业科学院农业信息研究所,北京 100081
  • 收稿日期:2024-01-05 出版日期:2024-08-20 发布日期:2024-08-20
  • 通讯作者: *胡林(E-mail: hulin@caas.cn
  • 作者简介:赵小丹,中国农业科学院农业信息研究所, 硕士研究生, 主要研究方向为科技政策、知识图谱。
    本文中负责论文撰写和实验验证。
    ZHAO Xiaodan is a master's student at the Agricultural Information Institute of the Chinese Academy of Agricultural Sciences. Her research interests include science and technology policy and knowledge graph.
    In this paper, she is responsible for experimental verification and manuscript writing.
    E-mail: 17835074257@163.com|胡林,中国农业科学院农业信息研究所,研究员,硕士生导师,主要研究方向为数据科学、农业信息技术、智慧农业。
    本文中负责写作指导以及论文最终审定。
    HU Lin is a researcher and master's supervisor at the Agricultural Information Institute of the Chinese Academy of Agricultural Sciences. His main research areas include data science, agricultural information technology, and smart agriculture.
    In this paper, he is responsible for writing instruction and manuscript reviewing.
    E-mail: hulin@caas.cn
  • 基金资助:
    国家重点研发计划“面向融合科学场景的应用示范”(2021YFF0704204)

Research on Knowledge Extraction Method for Agricultural Science and Technology Policies Based on Deep Learning

ZHAO Xiaodan(),HU Lin*()   

  1. Agricultural Information Institute of CAAS, Beijing 100081, China
  • Received:2024-01-05 Online:2024-08-20 Published:2024-08-20

摘要:

【应用背景】 农业科技政策对科技进步和农业生产发展具有重要影响,不同政府部门发布的政策具有针对概念实体的关联性。【目的】 针对农业科技政策命名实体识别及关系抽取高度依赖人工设计特征耗时耗力的问题,提出一种基于BERT-BiLSTM-CRF模型的农业科技政策知识抽取方法。【方法】 针对领域语料特征,提出一种新标注模式,对三元组直接建模,替代传统的联合抽取或分别建模,将实体关系识别转化为序列标注问题,实验选取政策文本共19,779个句子、376,721个字符,针对政策、行业等8类实体和引用、发布等10种关系进行识别。【结果】 使用的BERT-BiLSTM-CRF模型在语料集上准确率为81.61%、召回率为85.34%、F1值为83.47%,实验结果表明,该方法能够有效抽取农业科技政策实体及关系,效果优于其他经典模型。

关键词: 农业科技政策, BERT-BiLSTM-CRF, 知识抽取, 实体识别

Abstract:

[Application Background] Agricultural science and technology policies have a significant impact on technological progress and the development of agricultural production. Policies issued by different government departments have correlations with conceptual entities. [Objective] Addressing the issue of time-consuming and labor-intensive manual feature design for named entity recognition and relationship extraction in agricultural science and technology policies, this study introduces a knowledge extraction approach utilizing the BERT-BiLSTM-CRF model. [Method] Using a new annotation pattern adapted to the domain corpus, directly modeling triplets, instead of the traditional separate modeling or joint extraction, transforms the entity and relationship extraction problem into a sequence labeling task. The experiment involved 19,779 sentences and 376,721 characters of policy text, identifying eight types of entities such as policy and industry, and ten types of relationships such as citation and publication. [Results] The model achieves an accuracy of 81.61%, a recall of 85.34%, and an F1 score of 83.47% on the corpus. The results of the experiments demonstrate that the suggested approach proficiently extracts entities and relationships related to agricultural science and technology policies, and its performance surpasses that of other classical models.

Key words: agricultural science and technology policies, BERT-BiLSTM-CRF, knowledge extraction, entity recognition