An Analysis of Stock Trading Algorithms Based on Reinforcement Learning Methods

doi:10.11871/jfdc.issn.2096-742X.2026.01.004

Abstract

Abstract:

[Objective] Stock trading represents a critical topic within the field of finance and has garnered significant attention for the integration of reinforcement learning models into trading algorithms. [Coverage] This paper collects literature related to Reinforcement Learning algorithms designed for stock trading in recent years. [Methods] This paper systematically outlines the core challenges of intelligent computing in stock market modeling, discusses the limitations of traditional statistical methods and machine learning models in the stock domain, and analyzes state-of-the-art stock reinforcement learning algorithms. By comparing the performance of three representative algorithms, this study conducts an empirical evaluation from three dimensions: return stability, risk control, and computational efficiency. Based on the experimental results, the paper thoroughly examines the strengths and weaknesses of each algorithm. Building on these findings, it proposes future research directions for this field, providing theoretical foundations and practical references for further studies. [Conclusions] Although the integration of stock reinforcement learning algorithms with models such as LSTM and NLP has shown promising performance in backtesting, the increased model complexity has significantly raised computational time costs, reducing their efficiency in real-time analysis scenarios. Future research needs to advance both efficient algorithm design and hardware adaptation optimization to address current limitations.

Key words: reinforcement learning, stock trading, high-performance computing

LIAO Yuming,LU Yutong. An Analysis of Stock Trading Algorithms Based on Reinforcement Learning Methods[J]. Frontiers of Data and Computing, 2026, 8(1): 35-44, https://cstr.cn/32002.14.jfdc.CN10-1649/TP.2026.01.004.

Figures/Tables 7

Table 1

Symbol description"

符号	描述
$w i$	资产i的投资组合权重
N	总的观察期数
$R$	回报
$R f$	无风险收益率
$R x$	预期回报
P	价值
MAR	最小可接受回报

Table 1

Fig.1

Fig.2

Fig.3

Fig.4

Table 2

Table 3

References 26

[1]	JORION P, GOETZMANN W. Global Stock Markets in the Twentieth Century[J]. The journal of finance, 1999, 54(3): 953-980. doi: 10.1111/jofi.1999.54.issue-3
[2]	ARIYO A, ADEWUMI A, AYO C. Stock Price Prediction Using the ARIMA Model[C]. IEEE UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 2014: 106-112.
[3]	BOYLE P. Options: A Monte Carlo approach[J]. Journal of Financial Economics 1977, 4(3): 323-338. doi: 10.1016/0304-405X(77)90005-8
[4]	DIXON M, HALPERIN I, BILOKON P. Machine Learning in Finance[M]. Springer International Publishing, 2020, 1170: 111-116.
[5]	LIU A, FENG B, WANG B, et al. Deepseek-v2:A Strong, Economical, and Efficient Mixture-of-Experts Language Model[EB/OL]. [2024-02-10]. https://arxiv.org/abs/2405.04434.
[6]	SUTTON RS, BARTO AG. Reinforcement Learning: An Introduction[J]. Cambridge: MIT press, 1998, 1(1): 9-11.
[7]	Total Market Value of the U.S. Stock Market[EB/OL]. [2024-02-10]. https://siblisresearch.com/data/us-stock-market-value/.
[8]	全国股票交易统计表[EB/OL]. [2024-02-10]. https://data.eastmoney.com/cjsj/gpjytj.html.
[9]	FAN J, YAO Q. The Elements of Financial Econometrics[M]. Cambridge University Press, 2017: 53-58.
[10]	CHODAKOWSKA E, NAZARKO J, NAZARKO Ł. Arima Models in Electrical Load Forecasting and Their Robustness to Noise[J]. Energies, 2021, 14(23): 7952. doi: 10.3390/en14237952
[11]	LIU J. Navigating the Financial Landscape: The Power and Limitations of the ARIMA Model[J]. Highlights Sci. Eng. Technol., 2024, 88: 747-752. doi: 10.54097/9zf6kd91
[12]	PATLE A, CHOUHAN DS. SVM Kernel Functions for Classification[C]. IEEE International Conference on Advances in Technology and Engineering, 2013: 1-9.
[13]	KANAPARTHI V. Robustness Evaluation of LSTM-based Deep Learning Models for Bitcoin Price Prediction in the Presence of Random Disturbances[EB/OL]. [2024-02-10]. https://doi.org/10.21203/rs.3.rs-3906529/v1.
[14]	WATKINS CJ, DAYAN P. Q-learning[J]. Machine learning, 1992, 8: 279-292.
[15]	VAN HASSELT H, GUEZ A, SILVER D. Deep Reinforcement Learning with Double Q-learning[C]. AAAI Conference on Artificial Intelligence, 2016, 30(1): 2094-2100.
[16]	NIU H, LI S, LI J. MetaTrader: An Reinforcement Learning Approach Integrating Diverse Policies for Portfolio Optimization[C]. ACM International Conference on Information & Knowledge Management, 2022: 1573-1583.
[17]	ZONG C, WANG C, QIN M, et al. MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading[C]. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024: 4712-4721.
[18]	QIN M, SUN S, ZHANG W, et al. Earnhft: Efficient Hierarchical Reinforcement Learning for High Frequency Trading[C]. AAAI Conference on Artificial Intelligence, 2024, 38(13): 14669-14676.
[19]	SUN S, XUE W, WANG R, et al. DeepScalper: A Risk-Aware Reinforcement Learning Framework to Capture Fleeting Intraday Trading Opportunities[C]. ACM International Conference on Information & Knowledge Management, 2022: 1858-1867.
[20]	WANG Z, HUANG B, TU S, et al. Deeptrader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding[C]. AAAI Conference on Artificial Intelligence, 2021(1): 643-650.
[21]	LIN S, BELING PA. An End-to-End Optimal Trade Execution Framework based on Proximal Policy Optimization[C]. Conference on International Joint Conferences on Artificial Intelligence, 2021: 4548-4554.
[22]	SAWHNEY R, WADHWA A, AGARWAL S, et al. Quantitative Day Trading from Natural Language using Reinforcement Learning[C]. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 4018-4030.
[23]	YE Y, PEI H, WANG B, et al. Reinforcement-learning Based Portfolio Management with Augmented Asset Movement Prediction States[C]. AAAI Conference on Artificial Intelligence, 2020, 34(1): 1112-1119.
[24]	YIN Q Y, YU T T, SHEN S Q, et al. Distributed Deep Reinforcement Learning: A Survey and a Multi-player Multi-agent Learning Toolbox[J]. Machine Intelligence Research, 2024: 411-430.
[25]	LIU XY, YANG H, GAO J, et al. FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance[C]. ACM International Conference on AI in Finance, 2021: 1-9.
[26]	LI Z, LIU XY, ZHENG J, et al. Finrl-podracer: High Performance and Scalable Deep Reinforcement Learning for Quantitative Finance[C]. ACM international conference on AI in finance, 2021: 1-9.

模型	数据集	RL算法	适用市场类型	任务	挑战	策略
DeepScalper	Wind^[1]	BDQN	宏观经济周期敏感型	PM	市场不确定性、风险控制	采用编码器-解码器架构获取宏观和微观市场信息；设计风险辅助器
DeepTrader	市场指数^[2]	Policy-Based	宏观经济周期敏感型	PM	股票内在关联、市场不确定性	融合TCL、SAM、GCL、LSTM+attention捕捉时序变化、股票间关联和市场变化信息
ETEO	WRDS^[3]	PPO	宏观经济周期敏感型	TE	长程依赖性	融合LSTM、FCN捕捉长短依赖
SARL	人工标注	DDPG	政策敏感型	PM	数据异构、非平稳性	融合新闻数据增强状态
PROFIT	US S&P 500	DDPG	政策敏感型	TE	市场不确定性	融合新闻、社交媒体数据增强状态
EarnHFT	加密货币	DDQN	高波动型	HFT	高波动性、长程依赖性	采用分层强化学习

	TR	SR	VOL	MDD	CPU times
EIIE	14.64%	1.304	0.7189%	6.743%	1 h 1 min 43 s
SARL	19.61%	1.488	0.8119%	8.0916%	1 h 25 min 33 s
DeepTrader	18.42%	0.2352	0.7408%	48.98%	1 h 6 min 42 s
DeepScalper	18.21%	0.4998	1.789%	42.13%	1 min 23 s