数据与计算发展前沿 ›› 2024, Vol. 6 ›› Issue (5): 126-138.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.05.012

doi: 10.11871/jfdc.issn.2096-742X.2024.05.012

• • 上一篇    下一篇

基于二重LOF与逆交叉验证的稳健AdaBoost回归模型

曾凡倍1(),杨联强2,*()   

  1. 1.安徽大学,大数据与统计学院,安徽 合肥 230601
    2.安徽大学,人工智能学院,安徽 合肥 230601
  • 收稿日期:2023-01-03 出版日期:2024-10-20 发布日期:2024-10-21
  • 通讯作者: * 杨联强(E-mail: yanglq@ahu.edu.cn
  • 作者简介:曾凡倍,安徽大学大数据与统计学院,硕士研究生,主要研究方向为机器学习。
    本文中负责论文的初稿撰写、数据收集、算法实现。
    ZENG Fanbei is a postgraduate student at School of Big Data and Statistics, Anhui University. His research interest is machine learning.
    In this paper, he is responsible for the paper drafting, data collection, and algorithm implementation.
    E-mail: 2275920905@qq.com|杨联强,安徽大学人工智能学院,副教授,博士,硕士研究生导师,分别主持国家和省部级自然科学基金多项,在国内外重要学术刊物发表论文多篇,主要研究方向为机器学习和回归模型。
    本文中负责论文思想方法、实验设计。
    YANG Lianqiang is an associate professor and a master’s supervisor at the School of Artificial Intelligence, Anhui University. He has been funded by some programs of national sciences and published several papers in important academic journals both domestically and internationally. His research interests include machine learning and regression analysis.
    In this paper, he is responsible for the innovative ideas and experimental designs.
    E-mail: yanglq@ahu.edu.cn
  • 基金资助:
    安徽高校自然科学基金(KJ2021A0049);安徽省自然科学基金(2208085MA06)

Robust AdaBoost Regression Model Based on Double LOF and Inverse-Cross-Validation

ZENG Fanbei1(),YANG Lianqiang2,*()   

  1. 1. School of Big Data and Statistics, Anhui University, Hefei, Anhui 230601, China
    2. School of Artificial Intelligence, Anhui University, Hefei, Anhui 230601, China
  • Received:2023-01-03 Online:2024-10-20 Published:2024-10-21

摘要:

【目的】传统AdaBoost回归模型的稳健性不足,改进的AdaBoost.RT+、AdaBoost.RS算法仍然存在对异常数据抑制效果不显著和识别正确率较低等问题,增强AdaBoost方法的稳健性具有重要的实际应用价值。【方法】给出的AdaBoost.R_LOF模型,首先提出二重LOF和逆交叉验证算法,并将两种方法结合,以概率刻画数据的异常程度。然后在AdaBoost.R2算法的基础上,根据数据的异常程度,对数据设置恰当的权重系数,在不影响正常数据迭代的同时抑制异常数据的影响。【结果】使得新模型具有更好的稳健性,并且得到更小的预测均方误差。【局限】该方法需要调节的超参数有所增加,需要根据数据集分布特征进行调整。【结论】模拟和真实案例结果显示,相比于AdaBoost.R2、AdaBoost.RT+和AdaBoost.RS算法,在不同比例异常值的数据集下,该方法都具有更好的稳健性和估计效果。

关键词: AdaBoost算法, 二重LOF算法, 逆交叉验证, AdaBoost.R_LOF算法

Abstract:

[Objective] The robustness of the traditional AdaBoost regression model is insufficient. The improved AdaBoost.RT+ and AdaBoost.RS algorithms hold insignificant suppression on abnormal data and low identification accuracy of abnormal data. It is meaningful to enhance the robustness of AdaBoost algorithms. [Methods] First, dual LOF and inverse cross validation algorithms are proposed, the abnormal degree of data is characterized by probability based on these two algorithms. Then, appropriate weight coefficients are given according to the abnormal degree of the data to suppress its influence and keep no effect on the normal data. [Results] This AdaBoost.R_LOF model holds better robustness and less mean squared error on prediction. [Limitations] However, more hyperparameters are needed. [Conclusions] Simulations and real applications show that the new model has better robustness and estimation under the different proportions of outliers compared with AdaBoost.R2, AdaBoost.RT+ and AdaBoost.RS algorithms.

Key words: oAdaBoost, double LOF, Inverse-Cross-Validation, AdaBoost.R_ LOF