Frontiers of Data and Computing ›› 2026, Vol. 8 ›› Issue (3): 166-180.

doi: 10.11871/jfdc.issn.2096-742X.2026.03.014

• Technology and Application • Previous Articles     Next Articles

Multi-Layer Game System of Power Supply Chain Based on Reinforcement Learning

NIU Xinxin1(),LIU Yuxuan2,WANG Yijing3,YOU Bo4,*(),LI Xueen4   

  1. 1 CHN Energy New Energy Technology Research Institute Co., Ltd, Beijing 102209, China
    2 Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    3 Harbin Institute of Technology, Harbin, Heilongjiang 150001, China
    4 Tianjin Zhongke Intelligent Identification Co., Ltd, Tianjin 300457, China
  • Received:2025-10-29 Online:2026-06-20 Published:2026-06-18
  • Contact: YOU Bo E-mail:16810093@ceic.com;youbo2019@ia.ac.cn

Abstract:

[Objective] To address the multi-level game optimization problem in the power and coal supply chain, this study uses China’s provincial coal-fired power supply chain as a context. Addressing the limitations of traditional rule-based decision-making and game-theoretic equilibrium solutions in handling dynamic, high-dimensional action spaces and the learning adaptability of participants, this study constructs a multi-agent model encompassing provincial operators, municipal power plants, and coal mines. [Methods] This model employs a Stackelberg game framework for hierarchical coordination, incorporates Nash equilibrium to simulate intra-level competition, and integrates the TD3BC reinforcement learning algorithm to optimize agent decision-making. A unified price auction market clearing mechanism ensures supply and demand matching. [Results] By comparing the performance of three power plant game objectives—profit protection, pure cost optimization, and market-based bidding—the market-based bidding model demonstrates the best overall system efficiency and supply-demand balance. Furthermore, the implementation of the TD3BC algorithm significantly improves system total profit, market efficiency, and stability compared to traditional rule-based decision-making. [Limitations] This study is limited by the use of simplified market parameters and the lack of consideration of factors such as transportation topology, long-term contracts, and unit constraints in real markets. [Conclusions] The method combining reinforcement learning with multi-layer game theory can effectively optimize the decision-making of the power supply chain and provide theoretical support for the integrated operation of coal and electricity. The market-based bidding strategy is more suitable for the scenario of pursuing system efficiency and profit growth.

Key words: power supply chain, multi-layer game, reinforcement learning, market clearing mechanism, TD3BC algorithm, Stackelberg game, supply and demand balance