Frontiers of Data and Computing ›› 2022, Vol. 4 ›› Issue (5): 98-107.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.05.011

doi: 10.11871/jfdc.issn.2096-742X.2022.05.011

• Technology and Application • Previous Articles     Next Articles

Research Progress in Data-Driven Threat Hunting Language Models

ZHANG Runzi1,*(),KANG Bin2   

  1. 1. NSFOCUS Information Technology Co., Ltd., Beijing 100089, China
    2. Unit 96941 of PLA, Beijing 100085, China
  • Received:2021-08-27 Online:2022-10-20 Published:2022-10-27
  • Contact: ZHANG Runzi E-mail:runzi_zhang@163.com

Abstract:

[Objective] This paper summarizes the research progress of language models for proactive threat hunting driven by data, and provides technical foresight for the detection and source tracing of advanced threats. [Methods] This study combines the frontier academic and industrial progress of cyber security, introduces and summarizes related research from multiple levels such as the evaluation metric construction for threat hunting, the fusion of multi-modal, multi-dimensional, and multi-source data, the dependency explosion mitigation and analysis, and finally the modeling of multi-modal threat hunting language. [Results] Combining the key requirements of threat hunting and related technology trends, the supported data types, model types, modeling methods, timeliness, and other dimensions of the research status and future works of data-driven threat hunting and language models are comprehensively summarized. [Conclusions] In the face of threat detection and forensic scenarios for adversarial APT attacks, on one hand, it is necessary to build a multi-source heterogeneous fusion data infrastructure and solve the problem of data dependence explosion. On the other hand, it is still necessary to explore standardized and flexible language models to support the unified analysis of multi-modal, multi-source, and multi-dimensional data.

Key words: threat hunting, security operations, advanced persistent threats