数据与计算发展前沿 ›› 2020, Vol. 2 ›› Issue (6): 82-89.

doi: 10.11871/jfdc.issn.2096-742X.2020.06.009

• 技术与应用 • 上一篇    下一篇

FINFLO:快速局部异常点检测算法

杨校林1,2,李菁菁1,*(),李易1   

  1. 1. 中国科学院计算机网络信息中心,北京 100190
    2. 中国科学院大学,北京 100049
  • 收稿日期:2020-08-08 出版日期:2020-12-20 发布日期:2020-12-29
  • 通讯作者: 李菁菁
  • 作者简介:杨校林,中国科学院计算机网络信息中心,在读硕士研究生,主要研究领域为聚类与异常点检测,智能网络运管分析技术等。本文中承担的任务是算法设计、实验设计与文献撰写。
    YANG Xiaolin is a graduate student in Computer Network Information Center of Chinese Academy of Sciences. His main research areas are clustering and anomaly detection, and intelligent network operation and management analysis technology. In this paper, he is responsible for the algorithm design, experiments design and paper writing. Email: xlyang@cnic.cn|李菁菁,中国科学院计算机网络信息中心,高级工程师,硕士生导师。主要研究领域为未来网络体系结构,大规模云基础设施智能监测感知,智能网络运管分析技术等。主要负责中国科技网全网及超大型信息系统的设计、建设和长期运管等工作。主持和参与了多项发改委、科技部、国家自然科学基金委的重点课题和中国科学院信息化专项工程建设项目。本文中负责思路解析和总体统稿。
    LI Jingjing is senior engineer and master tutor of Computer Network Information Center of Chinese Academy of Sciences. His main research areas are future network architecture, large-scale cloud infrastructure intelligent monitoring and perception, and intelligent network operation and management analysis technology. In this paper, he is responsible for the idea analysis and overall draft.|李易,中国科学院计算机网络信息中心,工程师,长期从事科技网核心网运维工作,对核心网运维和优化有丰富的经验。现在主要负责科技网核心网网络优化和监控系统方面的设计、建设和DNS运维及负载均衡建设工作。参与了科技部和中国科学院信息化专线工程建设项目。本文中负责数据整理。
    LI Yi is an engineer of Computer Network Information Center of Chinese Academy of Sciences. He has long been engaged in the operation and maintenance of the core network of the science and technology network, and has extensive experience in the operation, maintenance and optimization of the core network. In this paper, he is responsible for data preparation.E-mail: liyi@cstnet.cn

FINFLO: A New Fast Local Outliers Detection Algorithm

YANG Xiaolin1,2,LI Jingjing1,*(),LI Yi1   

  1. 1. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-08-08 Online:2020-12-20 Published:2020-12-29
  • Contact: LI Jingjing

摘要:

【目的】基于局部密度的LOF算法时间复杂度高,且容易将处于簇边缘的正常对象误判成异常对象,INFLO算法引进反向k-近邻解决LOF算法这一缺陷,但是计算每个对象的局部异常因子时都会使用反向k-近邻没有必要且耗费时间。【方法】通过对两个算法的分析,本文改进了INFLO算法,提出了一种快速异常点检测算法FINFLO( faster Influenced outlierness),该算法的主要思想是:计算对象的局部因子时尽量避免考虑反向k-近邻对象,尽可能地只利用k-近邻对象。首先,计算出所有对象的反向k-近邻对象个数的均值,然后在计算对象的局部异常因子时,如果对象的反向k-近邻对象个数不小于所有对象的反向k-近邻对象个数均值,则只需要考虑对象的k-近邻对象,否则需要同时考虑k-近邻对象和反向k-近邻对象。【结论】实验结果显示,该算法能够提高离群点检测的精度,降低时间复杂度,实现有效的局部离群点的检测。

关键词: 局部密度, 异常因子, 局部离群点, k-近邻, 反向k-近邻

Abstract:

[Objective] Local density based LOF algorithm has high time complexity, and it tends to misjudge the normal objects at the edge of the cluster as exceptions. The inverse k-nearest-neighbor algorithm is introduced to solve the problem of LOF algorithm in INFLO algorithm. However, it is unnecessary and time-consuming to use the inverse k-nearest-neighbor when calculating the local outlier factor of each object. [Methods] Through the analysis of the two algorithms, this paper proposes a new fast anomaly detection algorithm, named Faster Influenced Outlierness, FINFLO. When calculating the local factors of objects, FINFLO tries to avoid considering reverse k-nearest neighbor objects, and use only k-nearest neighbor objects as much as possible. If the number of reverse k-nearest neighbor objects is not less than the mean of all reverse k-nearest neighbor objects, only k-nearest neighbor objects need to be considered, otherwise reverse k-nearest neighbor objects need to be considered. [Conclusions] Experimental results show that the algorithm can improve the accuracy of outlier detection, reduce the time complexity, and achieve effective local outlier detection.

Key words: local density, outlier factor, local outlier, k-nearest neighbor, reverse k-nearest neighbor