Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (3): 29-39.

doi: 10.11871/jfdc.issn.2096-742X.2026.03.003

• Special Issue: Call for Papers for the 21st National Conference on Scientific Computing • Previous Articles     Next Articles

A Deep Learning-Based Intelligent Anomaly Detection Method for the Lustre File System

HOU Siqi1(),CHENG Yaosong1,CHENG Yaodong1,2,*(),LI Haibo1,2,BI Yujiang1,YAO Qiuling1   

  1. 1 Institute of High Energy Physics, Chinese Academy of Science, Beijing 100049, China
    2 University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2025-10-27 Online:2026-06-20 Published:2026-06-18
  • Contact: CHENG Yaodong E-mail:housq@ihep.ac.cn;chyd@ihep.ac.cn

Abstract:

[Background] The Lustre file system is a crucial foundation for scientific computing. With the continuous expansion of storage capacity and user workload, the load on storage systems is constantly increasing. Abnormal read and write requests from a single user often lead to storage cluster lag, affecting the overall user experience. Traditional anomaly handling typically involves maintainers browsing logs, identifying and locating abnormal users or read/write requests, and finally implementing troubleshooting strategies to restore system access speed. [Purpose] To improve the efficiency of anomaly diagnosis and replace the inefficient mode of relying on manual screening of logs by maintainers, this study introduces deep learning into the operation and maintenance process to achieve intelligent diagnosis of abnormal read and write behavior. [Method] This study builds an intelligent abnormal behavior detection system for Lustre file systems, involving user behavior data collection, time-series data processing, model construction, training, and deployment verification. The system converts user read and write information into time series data, builds a long short-term memory network and trains the model using unsupervised learning. [Conclusions] This study successfully trained and validated the model on Lustre's MDT and OST data. Experimental results show that the proposed method can significantly improve the accuracy of anomaly detection and effectively reduce the false alarm rate. The proposed method can reduce the time cost for error localization for maintainers and improve the efficiency of anomaly handling in file systems within scientific computing environments.

Key words: AIOps, storage system, time-series data processing