Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (2): 56-66.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.02.006

doi: 10.11871/jfdc.issn.2096-742X.2024.02.006

• Special Issue: 30th Anniversary of China’s Full-Functional Connection to the Internet • Previous Articles     Next Articles

Email Masquerade Attack Detection Based on Semi-Supervised Learning

LI Chang1,2(),LONG Chun1,*(),ZHAO Jing1,YANG Yue1,WANG Yueda1,PAN Qingfeng3,YE Xiaohu4,WU Tiejun5,TANG Ning6   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100083, China
    2. University of Chinese Academy of Sciences, Beijing 100101, China
    3. Guangdong Coremail Technology Ltd., g, Guangzhou, Guangdon 511400, China
    4. NSFOCUS Technologies Group Co., Ltd., Beijing 100089, China
    5. School of Cyber Science and Engineering, Southeast University, Nanjing, Jiangsu 211189, China
    6. Topsec Technologies Group Inc., Beijing 100193, China
  • Received:2024-03-27 Online:2024-04-20 Published:2024-04-26

Abstract:

[Objective] Masquerade attacks are a typical attack in email systems, where attackers illicitly obtain genuine user authentication credentials to access unauthorized services, causing significant damage. Due to the complexity of email usage scenarios and the irregular distribution of data, the limited labeled anomaly data makes the detection of masquerade attacks in email systems challenging. [Methods] To solve the above issues, we propose a rule-based self-training Auto-Encoder anomaly detection framework. Initially, the framework analyzes and categorizes scenarios of the SMTP email protocol log data, introducing coarse-grained label correction rules. Subsequently, it employs an Auto-Encoder for iterative detection through self-training, with each detection result refined by rules. Lastly, the kernel density estimation method is utilized to find an appropriate threshold to reduce the false positive rate. [Results] Utilizing data from 6736 real corporate email accounts over three months, the framework detected 7 anomalous accounts and 12 anomalous IP addresses. The proposed method detects more than 75% anomalous accounts compared to those detected by the corporate Security Operations Center (SOC), meanwhile the number of false positive accounts is reduced by 81.3%.

Key words: semi-supervised learning, self-training, auto-encoder, masquerade attack, email protocol