数据与计算发展前沿 ›› 2022, Vol. 4 ›› Issue (5): 98-107.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.05.011

doi: 10.11871/jfdc.issn.2096-742X.2022.05.011

• 技术与应用 • 上一篇    下一篇

数据驱动的威胁狩猎语言模型研究进展

张润滋1,*(),康彬2   

  1. 1.绿盟科技集团股份有限公司,北京 100089
    2.解放军96941部队,北京 100085
  • 收稿日期:2021-08-27 出版日期:2022-10-20 发布日期:2022-10-27
  • 通讯作者: 张润滋
  • 作者简介:张润滋,绿盟科技集团股份有限公司,高级安全研究员,博士,清华大学博士后,主要研究方向为智能安全运营、威胁狩猎、可解释人工智能等。
    本文中负责总体统稿,文章整体构思和设计,文献调研和论文撰写。
    ZHANG Runzi, Ph.D., is a senior security researcher of Nsfo-cus Information Technology Co., Ltd and post-doctoral at Tsi-nghua University. His recent research interests include AISec-Ops, threat hunting, and explainable AI.
    In this paper, he is responsible for the overall construction and draft, literature survey, and manuscript writing.
    E-mail: runzi_zhang@163.com
  • 基金资助:
    中国博士后科学基金资助项目(2020M670181)

Research Progress in Data-Driven Threat Hunting Language Models

ZHANG Runzi1,*(),KANG Bin2   

  1. 1. NSFOCUS Information Technology Co., Ltd., Beijing 100089, China
    2. Unit 96941 of PLA, Beijing 100085, China
  • Received:2021-08-27 Online:2022-10-20 Published:2022-10-27
  • Contact: ZHANG Runzi

摘要:

【目的】梳理数据驱动下,面向主动式威胁狩猎的语言模型研究进展,为高级威胁的检测、溯源提供技术前瞻。【方法】结合安全前沿学术与工业进展,从威胁狩猎的评估指标构建,多模多维多源数据的融合、依赖爆炸缓解及分析,和多模态威胁狩猎语言的建模等多个层次分别介绍总结相关研究。【结果】结合威胁狩猎的关键需求与相关技术趋势,从支持的数据类型、模式类型、建模方法、实时性等维度,全面总结了数据驱动威胁狩猎与威胁狩猎语言模型的研究现状与研究趋势。【结论】面对高对抗性APT等威胁检测取证场景,一方面需要构建多源异构的融合数据基础设施,并解决数据的依赖爆炸问题;另一方面,仍需要探索标准化、灵活的语言模型,来支持多模态、多源、多维数据的统一分析。

关键词: 威胁狩猎, 安全运营, 高级持续性威胁

Abstract:

[Objective] This paper summarizes the research progress of language models for proactive threat hunting driven by data, and provides technical foresight for the detection and source tracing of advanced threats. [Methods] This study combines the frontier academic and industrial progress of cyber security, introduces and summarizes related research from multiple levels such as the evaluation metric construction for threat hunting, the fusion of multi-modal, multi-dimensional, and multi-source data, the dependency explosion mitigation and analysis, and finally the modeling of multi-modal threat hunting language. [Results] Combining the key requirements of threat hunting and related technology trends, the supported data types, model types, modeling methods, timeliness, and other dimensions of the research status and future works of data-driven threat hunting and language models are comprehensively summarized. [Conclusions] In the face of threat detection and forensic scenarios for adversarial APT attacks, on one hand, it is necessary to build a multi-source heterogeneous fusion data infrastructure and solve the problem of data dependence explosion. On the other hand, it is still necessary to explore standardized and flexible language models to support the unified analysis of multi-modal, multi-source, and multi-dimensional data.

Key words: threat hunting, security operations, advanced persistent threats