Frontiers of Data and Domputing ›› 2022, Vol. 4 ›› Issue (3): 78-89.

CSTR: 32002.14.jfdc.CN10-1649/TP.2022.03.006

doi: 10.11871/jfdc.issn.2096-742X.2022.03.006

• Special Issue: Advanced Intelligent Computing Platform and Application • Previous Articles     Next Articles

Overview of Root Cause Localization Method in Microservice Architecture

LI Siyi(),MA Shiyu(),CUI Liyue(),ZHANG Shenglin(),SUN Yongqian(),ZHANG Yuzhi()   

  1. College of Software, Nankai University, Tianjin 300350, China
  • Received:2022-02-24 Online:2022-06-20 Published:2022-06-20
  • Contact: ZHANG Shenglin E-mail:lisiyimail@qq.com;nkcs77@163.com;2320190026@mail.nankai.edu.cn;zhangsl@nankai.edu.cn;sunyongqian@nankai.edu.cn;zyz@nankai.edu.cn

Abstract:

[Objective] When the key performance indicators of the cloud native system are abnormal, operation and maintenance engineers are required to sort out the abnormal correlation behind the alarm storm and complex abnormal indicators in a timely manner, and perform accurate root cause localization and rapid recovery. [Methods] This paper introduces the way to build a fault propagation graph under a large-scale microservice architecture and a root cause localization technology based on graph reasoning. We investigate, compare and summarize the existing root cause localization methods based on the experience of operation and maintenance of the cloud platform and high-availability capacity building. [Results] The root cause localization method based on graph reasoning significantly improves the stability and reliability of the cloud system in large data centers. [Limitations] This method relies on a stable monitoring infrastructure and accurate anomaly detection capabilities for indicators. [Conclusions] With the deepening of digital transformation, the root cause localization technology of microservice indicators under the microservice architecture will play an increasingly important role in ensuring the stability of large-scale cloud platforms.

Key words: cloud native, microservices, AIOps, root cause localization