Frontiers of Data and Computing ›› 2022, Vol. 4 ›› Issue (5): 120-128.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.05.013

doi: 10.11871/jfdc.issn.2096-742X.2022.05.013

• Technology and Application • Previous Articles     Next Articles

Research on Military Domain Named Entity Recognition Based on Pre-Training Model

TONG Zhao*(),WANG Ludi,ZHU Xiaojie,DU Yi   

  1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
  • Received:2021-12-14 Online:2022-10-20 Published:2022-10-27
  • Contact: TONG Zhao E-mail:ztong@cnic.cn

Abstract:

[Objective] In order to solve Named Entity Recognition problems for open source unstructured military domain data. [Methods] This paper proposes a Named Entity Recognition method based on Bidirectional Encoder Representations from Transformers (BERT) model, which first generates a character representation of a dynamic feature word vector based on a self-built open-source military corpus, and then completes the entity recognition task with semantic feature extraction based on Bi-directional Long Short-Term Memory (BiLSTM) and optimal label sequences selected using Conditional Random Fields (CRF). [Results] Experimental results of the model on a self-built open-source military dataset show that the method proposed in this paper can achieve an 8% improvement in accuracy, an 11% improvement in F-value, and a 10% improvement in recall compared to methods based on statistical models and neural networks.[Limitations] Although there is a lack of publicly annotated datasets in the open-source military domain at this stage, it has not been possible to train BERT models on the open-source military corpus. [Conclusions] However, the open-source military named entity recognition method based on pre-trained models proposed in this paper addresses to some extent the boundary delineation problem and the poor performance of the entity recognition task in the presence of insufficient data sets.

Key words: Name Entity Recognition, Pre-Train Model, neutral network