Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (4): 87-95.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.04.007

doi: 10.11871/jfdc.issn.2096-742X.2024.04.007

• Special Issue: Fundamental Software Stack and Systems for National Scientific Data Centers • Previous Articles     Next Articles

A Root Cause Localization Method Based on Service Dependency Graph for Microservice System Failures

ZHANG Qixun1,*(),JIA Tong2,YANG Yong3,LI Ying4   

  1. 1. School of Software and Microelectronics, Peking University, Beijing 102600, China
    2. Institute for Artificial Intelligence, Peking University, Beijing 100871, China
    3. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China
    4. National Engineering Research Center for Software Engineering, Peking University, Beijing 100871, China
  • Received:2024-02-05 Online:2024-08-20 Published:2024-08-20

Abstract:

[Objective] To address the frequent occurrences of system failures and the rapid propagation of anomalies within microservice architectures, particularly due to the complexity of diagnosis caused by fine service granularity, frequent updates, and complex service dependencies, this paper proposes a rapid root cause localization method based on dynamic service dependency graphs. [Methods] This method utilizes configuration information and log data of microservices to dynamically generate service dependency graphs, effectively capturing the dynamic changes in service dependencies. In the event of a failure, it uses the service dependency graph and anomaly event data to infer the causal chain of anomalies and constructs an anomaly causality graph. By considering the weight of service dependencies, it searches and ranks potential root cause nodes in the service dependency graph to accurately locate the source of the anomaly. [Results] Experimental results demonstrate that the proposed method achieves an average precision rate of 66% for top 5 root cause localization, surpassing existing similar methods.

Key words: microservice, service dependency, anomaly causal relationship, root cause localization