数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (2): 56-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.02.006

doi: 10.11871/jfdc.issn.2096-742X.2024.02.006

• 专刊:中国全功能接入国际互联网30周年 • 上一篇    下一篇

基于半监督学习的邮件伪装攻击检测方法

李畅1,2(),龙春1,*(),赵静1,杨悦1,王跃达1,潘庆峰3,叶晓虎4,吴铁军5,唐宁6   

  1. 1.中国科学院计算机网络信息中心,北京 100083
    2.中国科学院大学,北京 100039
    3.论客科技(广州)有限公司,广东 广州 511400
    4.绿盟科技集团股份有限公司,北京 100089
    5.东南大学网络空间安全学院,江苏 南京 211189
    6.北京天融信网络安全技术有限公司,北京 100193
  • 收稿日期:2024-03-27 出版日期:2024-04-20 发布日期:2024-04-26
  • 通讯作者: *龙春(E-mail: longchun@cnic.cn
  • 作者简介:李畅,中国科学院大学计算机网络信息中心,博士研究方向为信息安全、异常检测、深度学习。
    本文主要承担工作为实验设计、论文撰写。
    LI Chang, doctor’s degree in Computer Network Information Center, Chinese Academy of Sciences, majoring in information security, anomaly detection and deep learning.
    The main work of this paper is to do the experiment and write paper.
    E-mail: cli@cnic.cn|龙春,中国科学院计算机网络信息中心,正高级工程师,研究方向为信息安全、数据保护、在国内外重要期刊及会议上发表学术论文10余篇,3项专利并主持多个国家级项目。
    本文主要承担工作为实验思路指导。
    LONG Chun, professor of Computer Network Information Center, Chinese Academy of Sciences. His research interests include information security and data protection. He has published more than 10 academic papers in important journals and conferences at home and abroad, obtained 3 patents and presided over several national projects.
    The main work of this paper is to provide guidance on experimental ideas.
    E-mail: longchun@cnic.cn
  • 基金资助:
    国家重点研发计划“金融数据全周期流转安全风险评估监测与溯源技术研究”(2023YFC3304704);中国科学院网络安全与信息化基金会“网络安全保障体系建设工程”(CAS-WX2022GC-04);中国科学院战略性先导科技专项“生物数据存储管理与交互利用系统”(XDB38030000)

Email Masquerade Attack Detection Based on Semi-Supervised Learning

LI Chang1,2(),LONG Chun1,*(),ZHAO Jing1,YANG Yue1,WANG Yueda1,PAN Qingfeng3,YE Xiaohu4,WU Tiejun5,TANG Ning6   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100101, China
    3. Guangdong Coremail Technology Ltd., g, Guangzhou, Guangdon 511400, China
    4. NSFOCUS Technologies Group Co., Ltd., Beijing 100089, China
    5. School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu 211189, China
    6. Topsec Technologies Group Inc., Beijing 100193, China
  • Received:2024-03-27 Online:2024-04-20 Published:2024-04-26

摘要:

【目的】伪装攻击是电子邮件系统中一种典型攻击,通过非法获取用户真实的身份验证凭证来访问未经授权的服务,造成重大损害。由于邮件使用场景复杂,数据分布不均匀,能获得的标记异常数据数量有限导致邮件系统伪装攻击异常检测困难。【方法】针对上述问题,本文提出了一种基于规则的自训练自动编码器异常检测框架。首先,针对SMTP邮件协议的日志数据,对其应用场景进行分析和分类,并提出粗粒度的标签修正规则。其次,利用自动编码器通过自训练进行迭代检测,通过规则对每次检测结果进行修正。最后,使用核密度估计方法找到合适的阈值减少误报率。【结果】本文使用了6,736个真实企业邮箱账户连续3个月的数据,检测到7个异常账号和12个异常IP地址,与企业安全运营中心(SOC)和3种先进算法比较,效果达到最优。本文方法所检测到的异常账号数量比SOC多75%,同时误报账号减少81.3%。

关键词: 半监督学习, 自训练, 自动编码器, 伪装攻击, 邮件协议

Abstract:

[Objective] Masquerade attacks are a typical attack in email systems, where attackers illicitly obtain genuine user authentication credentials to access unauthorized services, causing significant damage. Due to the complexity of email usage scenarios and the irregular distribution of data, the limited labeled anomaly data makes the detection of masquerade attacks in email systems challenging. [Methods] To solve the above issues, we propose a rule-based self-training Auto-Encoder anomaly detection framework. Initially, the framework analyzes and categorizes scenarios of the SMTP email protocol log data, introducing coarse-grained label correction rules. Subsequently, it employs an Auto-Encoder for iterative detection through self-training, with each detection result refined by rules. Lastly, the kernel density estimation method is utilized to find an appropriate threshold to reduce the false positive rate. [Results] Utilizing data from 6736 real corporate email accounts over three months, the framework detected 7 anomalous accounts and 12 anomalous IP addresses. The proposed method detects more than 75% anomalous accounts compared to those detected by the corporate Security Operations Center (SOC), meanwhile the number of false positive accounts is reduced by 81.3%.

Key words: semi-supervised learning, self-training, auto-encoder, masquerade attack, email protocol