Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (1): 45-63.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.005

doi: 10.11871/jfdc.issn.2096-742X.2026.01.005

• Special Issue: Computational Finance • Previous Articles     Next Articles

Jump Information, Machine Learning Models, and Realized Volatility Forecasting

FENG Wenjun1(),ZHANG Zhengjun2,3,4,*(),WANG Yiming5   

  1. 1. School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China
    2. School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
    3. AMSS Center for Forecasting Science, Chinese Academy of Sciences, Beijing 100190, China
    4. Department of Statistics, University of Wisconsin, Madison, WI 53706, USA
    5. School of Economics, Peking University, Beijing 100871, China
  • Received:2025-03-20 Online:2026-02-20 Published:2026-02-02
  • Contact: ZHANG Zhengjun E-mail:fengwj@bjtu.edu.cn.;zjz@stat.wisc.edu

Abstract:

[Objective] This study explores the synergy between stock price jump characteristics and machine learning models in realized volatility forecasting, analyzing the performance of different prediction methods across various time scales. [Methods] Using five-minute high-frequency data of SSE 50 Index constituent stocks from 2019 to 2024, we identify stock price jumps point by point with the threshold method and extract multidimensional features such as jump frequency and jump magnitude via the K-nearest neighbors (KNN) algorithm, constructing a feature system enriched with jump information. Subsequently, we employ extended Heterogeneous Autoregressive (HAR) models and ten machine learning algorithms, including KNN, Random Forest (RF), Gradient Boosting Regression Trees (GBRT), and Support Vector Regression (SVR), to predict realized volatility over multiple time horizons and systematically assess the combined effect of machine learning methods and jump information. [Results] In in-sample predictions, incorporating jump features and applying machine learning models both improve forecasting accuracy, with KNN and Random Forest performing the best. In out-of-sample predictions, the HAR-RV model is optimal for daily forecasts, while for weekly and monthly forecasts, jump information and machine learning models enhance prediction performance. However, when the HAR model already integrates jump information, machine learning methods fail to provide additional predictive improvements. [Conclusions] This study expands the feature space for volatility forecasting and systematically evaluates the effectiveness of machine learning methods in volatility prediction. The findings indicate that multidimensional jump features provide additional information that enhances medium- to long-term volatility forecasting accuracy. However, machine learning models can hardly further increase incremental value when the HAR model incorporates jump information. These insights hold significant implications for financial market risk management and asset pricing.

Key words: realized volatility, jumps, forecasting, machine learning, high-frequency data