数据与计算发展前沿 ›› 2026, Vol. 8 ›› Issue (1): 45-63.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.005

doi: 10.11871/jfdc.issn.2096-742X.2026.01.005

• 专刊:计算金融 • 上一篇    下一篇

跳跃信息、机器学习模型与已实现波动率预测

冯文君1(),张正军2,3,4,*(),王一鸣5   

  1. 1.北京交通大学,经济管理学院,北京 100044
    2.中国科学院大学,经济与管理学院,北京 100190
    3.中国科学院,AMSS预测科学研究中心,北京 100190
    4.威斯康星大学,统计系,麦迪逊,WI 53706
    5.北京大学,经济学院,北京 100871
  • 收稿日期:2025-03-20 出版日期:2026-02-20 发布日期:2026-02-02
  • 通讯作者: 张正军
  • 作者简介:冯文君,北京交通大学经济管理学院,讲师,主要研究方向为金融科技。
    本文中负责编程、论文写作等。
    FENG Wenjun is a Lecturer at the School of Economics and Management, Beijing Jiaotong University. Her main research interests include financial technology.
    In this paper, she is responsible for programming and manuscript writing.
    E-mail: fengwj@bjtu.edu.cn.|张正军,中国科学院大学经济与管理学院,教授,主要研究方向为金融风险管理。
    本文中负责论文写作、修改等。
    ZHANG Zhengjun is a Professor at the School of Economics and Management, University of Chinese Aca-demy of Sciences. His main research interests include financial risk management.
    In this paper, he is responsible for writing and revising the manuscript.
    E-mail: zjz@stat.wisc.edu.
  • 基金资助:
    国家自然科学基金重大基金项目(71991471);国家自然科学基金重点基金项目(72442027);国家自然科学基金青年科学基金项目(72401025);北京市国际金融学会研究课题资金(BIFS242001)

Jump Information, Machine Learning Models, and Realized Volatility Forecasting

FENG Wenjun1(),ZHANG Zhengjun2,3,4,*(),WANG Yiming5   

  1. 1. School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
    2. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
    3. AMSS Center for Forecasting Science, Chinese Academy of Sciences, Beijing 100190, China
    4. Department of Statistics, University of Wisconsin, Madison, WI 53706, USA
    5. School of Economics, Peking University, Beijing 100871, China
  • Received:2025-03-20 Online:2026-02-20 Published:2026-02-02
  • Contact: ZHANG Zhengjun

摘要:

【目的】探讨股价跳跃特征与机器学习模型在已实现波动率预测中的协同作用,分析不同预测方法在不同时间尺度上的表现。【方法】基于上证50指数成分股2019年至2024年的五分钟高频数据,采用阈值法逐点识别股价跳跃,并通过K-近邻算法(KNN)提取跳跃频率、跳跃幅度等多维特征,构建包含丰富跳跃信息的特征体系。随后,使用扩展的异质自回归波动率(HAR)模型及10种机器学习算法,包括KNN、随机森林(RF)、梯度提升回归树(GBRT)、支持向量回归(SVR)等,对多周期已实现波动率进行预测,并系统评估机器学习方法与跳跃信息的结合效果。【结果】样本内预测显示,引入跳跃特征与采用机器学习模型均能提高预测精度,其中KNN与随机森林的表现最优。在样本外预测中,HAR-RV模型在日度预测中仍然最优,而在周度和月度预测中,跳跃信息和机器学习模型可提升预测效果,但当HAR模型已整合跳跃信息后,机器学习方法未能进一步改善预测性能。【结论】本研究扩展了波动率预测的特征空间,并系统评估了机器学习方法在波动率预测中的有效性。研究表明,多维跳跃特征能够提供额外信息,有助于提高中长期波动率预测精度。然而在HAR模型已纳入跳跃信息后, 机器学习模型难以进一步提供增量价值。这一发现对金融市场风险管理和资产定价具有重要意义。

关键词: 已实现波动率, 跳跃, 预测, 机器学习, 高频数据

Abstract:

[Objective] This study explores the synergy between stock price jump characteristics and machine learning models in realized volatility forecasting, analyzing the performance of different prediction methods across various time scales. [Methods] Using five-minute high-frequency data of SSE 50 Index constituent stocks from 2019 to 2024, we identify stock price jumps point by point with the threshold method and extract multidimensional features such as jump frequency and jump magnitude via the K-nearest neighbors (KNN) algorithm, constructing a feature system enriched with jump information. Subsequently, we employ extended Heterogeneous Autoregressive (HAR) models and ten machine learning algorithms, including KNN, Random Forest (RF), Gradient Boosting Regression Trees (GBRT), and Support Vector Regression (SVR), to predict realized volatility over multiple time horizons and systematically assess the combined effect of machine learning methods and jump information. [Results] In in-sample predictions, incorporating jump features and applying machine learning models both improve forecasting accuracy, with KNN and Random Forest performing the best. In out-of-sample predictions, the HAR-RV model is optimal for daily forecasts, while for weekly and monthly forecasts, jump information and machine learning models enhance prediction performance. However, when the HAR model already integrates jump information, machine learning methods fail to provide additional predictive improvements. [Conclusions] This study expands the feature space for volatility forecasting and systematically evaluates the effectiveness of machine learning methods in volatility prediction. The findings indicate that multidimensional jump features provide additional information that enhances medium- to long-term volatility forecasting accuracy. However, machine learning models can hardly further increase incremental value when the HAR model incorporates jump information. These insights hold significant implications for financial market risk management and asset pricing.

Key words: realized volatility, jumps, forecasting, machine learning, high-frequency data