基于深度学习的无监督KPI异常检测

doi:10.11871/jfdc.issn.2096-742X.2020.03.008

数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (3): 87-100.

doi: 10.11871/jfdc.issn.2096-742X.2020.03.008

所属专题：下一代互联网络技术与应用

• 专刊：下一代互联网络技术与应用（上） • 上一篇下一篇

基于深度学习的无监督KPI异常检测

张圣林¹(),林潇霏¹(),孙永谦^1,^*(),张玉志¹(),裴丹²()

1. 南开大学,软件学院,天津 300350
2. 清华大学,计算机科学与技术系,北京 100084

收稿日期:2020-04-10 出版日期:2020-06-20 发布日期:2020-08-19
通讯作者: 孙永谦
作者简介:张圣林,南开大学软件学院,博士,讲师,主要研究数据中心网络中的故障检测、诊断和预测。发表SCI/EI收录论文15篇以上。
本文主要设计文章架构并修改论文。
Zhang Shenglin is currently an assistant professor in the College of Software, Nankai University, Tianjin and Beijing, China. His current research interests include failure detection, diagnosis and prediction in data center networks. He has published 15 papers that are indexed by SCI/EI.
In this paper he is mainly responsible for the design of the paper architecture and paper revision.
E-mail: zhangsl@nankai.edu.cn|林潇霏,南开大学软件学院研究生在读,主要研究异常检测和深度学习。
本文主要承担文献调研及实验。
Lin Xiaofei is currently a master student in the College of Software at Nankai University, Tianjin, China. Her research interests include anomaly detection and deep learning.
In this paper he is mainly responsible for the related work investigation and experimental evaluation.
E-mail: filler.helloworld@gmail.com|孙永谦,南开大学软件学院,博士,讲师,主要研究异常检测、根本原因定位以及数据中心的高性能切换。
本文主要承担文献调研。
Sun Yongqia is currently an assistant professor in the College of Software, Nankai University, Tianjin, China. His research interests include anomaly detection, root cause localization, and high performance switching in datacenter.
In this paper he is mainly responsible for the related work investigation.
E-mail: sunyongqian@nankai.edu.cn|张玉志,南开大学软件学院,院长,博士,讲席教授,主要研究方向为人工智能。
本文主要承担文献调研及指导。
Zhang Yuzhi is currently a distinguished professor and the dean of the College of Software, Nankai University. His research interests include deep learning and other aspects in artificial intelligence.
In this paper he is mainly responsible for the related work investigation.
E-mail: zyz@nankai.edu.cn|裴丹,清华大学计算机系,博士,副教授,主要研究方向为网络和服务管理。
本文主要承担文献调研。
Pei Dan is currently an associate professor in the Department of Computer Science and Technology, Tsinghua University. His research interests include network and service management in general.
In this paper he is mainly responsible for the related work investigation.
E-mail: peidan@tsinghua.edu.cn
基金资助:
国家重点研发计划(2018YFB0204304)

Research on Unsupervised KPI Anomaly Detection Based on Deep Learning

Zhang Shenglin¹(),Lin Xiaofei¹(),Sun Yongqian^1,^*(),Zhang Yuzhi¹(),Pei Dan²()

1. College of Software, Nankai University, Tianjin 300350,China
2. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Received:2020-04-10 Online:2020-06-20 Published:2020-08-19
Contact: Sun Yongqian

摘要/Abstract

摘要：

【目的】关键性能指标（Key Performance Indicator, KPI）异常检测作为互联网智能运维的基础,对快速故障发现和修复具有重要意义。【文献范围】本文重点调研国内外基于深度生成模型的无监督KPI异常检测方法。【方法】系统地阐述了Donut、Bagel和Buzz三种无监督KPI异常检测方法的理论模型,并分析了它们在准确性和效率等方面的优势与不足。【结果】本文基于生产环境中的KPI数据验证了三个方法的性能。【局限】基于深度生成模型的KPI异常检测方法仍在不断地演进,未来将探索更多该领域的新方法。【结论】针对不同特征的KPI数据,需要采用不同的深度生成模型：对于时间信息敏感的KPI数据,需要采用Bagel进行异常检测;对于非周期性的复杂KPI数据,需要采用Buzz检测其异常行为。

关键词: 深度学习, 无监督学习, 关键性能指标, 异常检测, 生成模型

Abstract:

[Objective] Automatic key performance indicator (KPI), the basis of Internet artificial intelligence operations (AIOps), is of vital importance to rapid failure detection and mitigation. [Scope of the literature] In this paper, we investigate unsupervised KPI anomaly detection methods, which are based on deep generative models. [Methods] We systematically describe the theoretic model of Donut, Bagel, and Buzz, which are all unsupervised KPI anomaly detection methods, and analyze their advantages and limitations in terms of accuracy and efficiency. [Results] We evaluate the performance of those three approaches based on real-world KPI data. [Limitations] The KPI anomaly detection methods based on deep generative model are continuously evolving, and we will explore more methods in this area. [Conclusions] Choosing a deep generative model should consider the characteristics of KPI data. Generally, if the KPI data is sensitive to timing information, we should apply Bagel to perform anomaly detection. Moreover, Buzz should be used if the data is non-seasonal and complex.

Key words: deep learning, unsupervised learning, key performance indicator, anomaly detection, generative model

张圣林,林潇霏,孙永谦,张玉志,裴丹. 基于深度学习的无监督KPI异常检测[J]. 数据与计算发展前沿, 2020, 2(3): 87-100.

Zhang Shenglin,Lin Xiaofei,Sun Yongqian,Zhang Yuzhi,Pei Dan. Research on Unsupervised KPI Anomaly Detection Based on Deep Learning[J]. Frontiers of Data and Computing, 2020, 2(3): 87-100.

图/表 16

图1

图2

图3

图4

图5

图6

图7

表1

图8

图9

图10

表2

表3

图11

表4

图12

参考文献 40

[1]	Vattikonda B C, Dave V, Guha S, et al. Empirical analysis of search advertising strategies[C]// Proceedings of the 2015 Internet Measurement Conference. 2015: 79-91.
[2]	Chen Y, Mahajan R, Sridharan B, et al. A provider-side view of web search response time[J]. ACM SIGCOMM Computer Communication Review, 2013,43(4):243-254. doi: 10.1145/2534169.2486035
[3]	Miao R, Potharaju R, Yu M, et al. The dark menace: Characterizing network-based attacks in the cloud[C]// Proceedings of the 2015 Internet Measurement Conference. 2015: 169-182.
[4]	Zhang S, Liu Y, Pei D, et al. Rapid and robust impact assessment of software changes in large internet-based services[C]// Proceedings of the 11th ACM Conference on Emerging Networking Experiments and Technologies. 2015: 1-13.
[5]	Liu D, Zhao Y, Xu H, et al. Opprentice: Towards practical and automatic anomaly detection through machine learning[C]// Proceedings of the 2015 Internet Measurement Conference. 2015: 211-224.
[6]	Xu H, Chen W, Zhao N, et al. Unsupervised anomaly detection via variational auto-encoder for seasonal KPI in web applications[C]// Proceedings of the 2018 World Wide Web Conference. 2018: 187-196.
[7]	Zhang S, Liu Y, Pei D, et al. Funnel: Assessing software changes in web-based services[J]. IEEE Transactions on Services Computing, 2016,11(1):34-48.
[8]	Knorn F, Leith D J. Adaptive kalman filtering for anomaly detection in software appliances[C]// IEEE INFOCOM Workshops 2008. IEEE, 2008: 1-6.
[9]	Pincombe B. Anomaly detection in time series of graphs using arma processes[J]. Asor Bulletin, 2005,24(4):2.
[10]	Yan H, Flavel A, Ge Z, et al. Argus: End-to-end service anomaly detection and localization from an ISP’s point of view[C]// 2012 Proceedings IEEE INFOCOM. IEEE, 2012: 2756-2760.
[11]	Lu W, Ghorbani A A. Network anomaly detection based on wavelet analysis[J]. EURASIP Journal on Advances in Signal Processing, 2008,2009:1-16.
[12]	Laptev N, Amizadeh S, Flint I. Generic and scalable framework for automated time-series anomaly detection[C]// Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2015: 1939-1947.
[13]	Amer M, Goldstein M, Abdennadher S. Enhancing one-class support vector machines for unsupervised anomaly detection[C]// Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description. 2013: 8-15.
[14]	Sölch M, Bayer J, Ludersdorfer M, et al. Variational inference for on-line anomaly detection in high-dimensional time series[J]. arXiv preprint arXiv:1602.07109, 2016.
[15]	Chandola V, Banerjee A, Kumar V. Anomaly detection: A survey[J]. ACM computing surveys (CSUR), 2009,41(3):1-58.
[16]	Erfani S M, Rajasegarar S, Karunasekera S, et al. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning[J]. Pattern Recognition, 2016,58:121-134. doi: 10.1016/j.patcog.2016.03.028
[17]	Fontugne R, Borgnat P, Abry P, et al. Mawilab: combining diverse anomaly detectors for automated anomaly labeling and performance benchmarking[C]// Proceedings of the 6th International COnference. 2010: 1-12.
[18]	Krishnamurthy B, Sen S, Zhang Y, et al. Sketch-based change detection: methods, evaluation, and applications[C]// Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement. 2003: 234-247.
[19]	Laxhammar R, Falkman G, Sviestins E. Anomaly detection in sea traffic-a comparison of the gaussian mixture model and the kernel density estimator[C]// 2009 12th International Conference on Information Fusion. IEEE, 2009: 756-763.
[20]	Lee S B, Pei D, Hajiaghayi M T, et al. Threshold compression for 3g scalable monitoring[C]// 2012 Proceedings IEEE INFOCOM. IEEE, 2012: 1350-1358.
[21]	Mahimkar A, Ge Z, Wang J, et al. Rapid detection of maintenance induced changes in service performance[C]// Proceedings of the Seventh COnference on emerging Networking EXperiments and Technologies. 2011: 1-12.
[22]	Nicolau M, McDermott J. One-class classification for anomaly detection with kernel density estimation and genetic programming[C]// European Conference on Genetic Programming. Springer, Cham, 2016: 3-18.
[25]	Shanbhag S, Wolf T. Accurate anomaly detection through parallelism[J]. IEEE network, 2009,23(1):22-28.
[23]	Yaacob A H, Tan I K T, Chien S F, et al. Arima based network anomaly detection[C]// 2010 Second International Conference on Communication Software and Networks. IEEE, 2010: 205-209.
[24]	Ma M, Zhang S, Pei D, et al. Robust and rapid adaption for concept drift in software system anomaly detection[C]// 2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2018: 13-24.
[25]	An J, Cho S. Variational autoencoder based anomaly detection using reconstruction probability[J]. Special Lecture on IE, 2015,2(1).
[26]	Zong B, Song Q, Min M R, et al. Deep autoencoding gaussian mixture model for unsupervised anomaly detection[J]. 2018.
[27]	Li Z, Chen W, Pei D. Robust and unsupervised KPI anomaly detection based on conditional variational autoencoder[C]// 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018: 1-9.
[28]	Chen W, Xu H, Li Z, et al. Unsupervised anomaly detection for intricate KPI via adversarial training of vae[C]// IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019: 1891-1899.
[29]	Kingma D P Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013.
[30]	Liu F T, Ting K M, Zhou Z H. Isolation forest[C]// 2008 Eighth IEEE International Conference on Data Mining. IEEE, 2008: 413-422.
[31]	Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]// Advances in neural information processing systems. 2014: 2672-2680.
[32]	Goodfellow I, Bengio Y, Courville A. Deep learning[M]. MIT press, 2016.
[33]	Kingma D P, Mohamed S, Rezende D J, et al. Semi-supervised learning with deep generative models[C]// Advances in neural information processing systems. 2014: 3581-3589.
[34]	Sohn K, Lee H, Yan X. Learning structured output representation using deep conditional generative models[C]// Advances in neural information processing systems. 2015: 3483-3491.
[35]	Arjovsky M, Chintala S, Bottou L. Wasserstein gan[J]. arXiv preprint arXiv:1701.07875, 2017.
[36]	Sterne J A C, White I R, Carlin J B, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls[J]. Bmj, 2009,338:b2393. doi: 10.1136/bmj.b2393 pmid: 19564179
[37]	Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]// Advances in neural information processing systems. 2012: 1097-1105.
[38]	Rezende D J, Mohamed S, Wierstra D. Stochastic backpropagation and approximate inference in deep generative models[J]. arXiv preprint arXiv:1401.4082, 2014.
[39]	Geweke J. Bayesian inference in econometric models using Monte Carlo integration[J]. Econometrica: Journal of the Econometric Society, 1989: 1317-1339.
[40]	AIOps Challenge. http://iops.ai/[OL]. [2020-04-20]

日期和时间	2018/7/3 16:25:13 星期二
分解信息	25(minute),16(hour),2(day of week)
one-hot编码	0…0,1,0…0\|0…0,1,0000000,\|0,1,00000（其中第一处省略23个0,第二处省略32个0,第三处省略14个0）

比较项	Donut	Bagel	Buzz
运行耗时	较短	较短	较长
运行性能要求	一般	一般	较高
需要大量数据	是	是	是
依赖异常标注	否	否	否
异常检测准确性	较好	较好	较好
适用于周期性KPI	是	是	否
适用于非周期KPI	否	否	是
对时间信息敏感	否	是	否

KPI序列数目	50
KPI数据点数目	226 200
缺失值点数	1 539
异常数据点数目	2 844

	Precision	Recall	F-Score
Donut	0.88	0.82	0.84
Bagel	0.91	0.85	0.88
Buzz	0.78	0.44	0.56

基于深度学习的无监督KPI异常检测

Research on Unsupervised KPI Anomaly Detection Based on Deep Learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 40

相关文章 15

编辑推荐

Metrics

本文评价

[1]	许淞源,刘峰. ESDRec：一种面向地球大数据平台的数据推荐模型[J]. 数据与计算发展前沿, 2023, 5(1): 55-64.
[2]	孙永谦,张茹茹,林子涵,张圣林,谭智元,张玉志. KPI异常检测方法评估[J]. 数据与计算发展前沿, 2022, 4(3): 46-65.
[3]	陈琼,杨咏,黄天林,冯媛. 小样本图像语义分割综述[J]. 数据与计算发展前沿, 2021, 3(6): 17-34.
[4]	蒲晓蓉,黄佳欣,刘军池,孙家瑜,罗纪翔,赵越,陈柯成,任亚洲. 面向临床需求的CT图像降噪综述[J]. 数据与计算发展前沿, 2021, 3(6): 35-49.
[5]	何涛,王桂芳,马廷灿. 基于词嵌入语义异常的跨学科研究内容发现方法[J]. 数据与计算发展前沿, 2021, 3(6): 50-59.
[6]	雷声,黎建辉,张丽丽. 基于无监督学习的可持续发展目标数据分类[J]. 数据与计算发展前沿, 2021, 3(4): 104-115.
[7]	张怡宁,何洪波,王闰强. 热门数字音频预测技术综述[J]. 数据与计算发展前沿, 2021, 3(4): 81-92.
[8]	陈子健,李俊,岳兆娟,赵泽方. 基于自编码器与属性信息的混合推荐模型[J]. 数据与计算发展前沿, 2021, 3(3): 148-155.
[9]	肖建平,龙春,赵静,魏金侠,胡安磊,杜冠瑶. 基于深度学习的网络入侵检测研究综述[J]. 数据与计算发展前沿, 2021, 3(3): 59-74.
[10]	李序,连一峰,张海霞,黄克振. 网络安全知识图谱关键技术[J]. 数据与计算发展前沿, 2021, 3(3): 9-18.
[11]	赵伟昱,张宏海,仲波. 基于深度学习的遥感影像地块分割方法[J]. 数据与计算发展前沿, 2021, 3(2): 133-141.
[12]	沈飙,陈扬,杨琛,刘博文. 海洋科学中尺度涡的计算机视觉检测和分析方法[J]. 数据与计算发展前沿, 2020, 2(6): 30-41.
[13]	任荟颖,王婧,王彦棡. 基于AutoML的湍流建模[J]. 数据与计算发展前沿, 2020, 2(4): 121-131.
[14]	陈雷,袁媛. 基于深度迁移学习的农业病害图像识别[J]. 数据与计算发展前沿, 2020, 2(2): 111-119.
[15]	刘成林. 文档图像识别技术回顾与展望[J]. 数据与计算发展前沿, 2019, 1(2): 17-25.