数据与计算发展前沿 ›› 2026, Vol. 8 ›› Issue (1): 77-90.

CSTR: 32002.14.jfdc.CN10-1649/TP.2026.01.007

doi: 10.11871/jfdc.issn.2096-742X.2026.01.007

• 技术与应用 • 上一篇    下一篇

基于贝叶斯网络与强化学习的警务资源动态调度方法

刘春龙1(),马秋平1,王润生2,胡今鸣1,3,胡啸峰1,3,*()   

  1. 1.中国人民公安大学,信息网络安全学院,北京 100038
    2.公安部高级警官学院,北京 100045
    3.中国人民公安大学,安全防范技术与风险评估公安部重点实验室,北京 102623
  • 收稿日期:2025-06-22 出版日期:2026-02-20 发布日期:2026-02-02
  • 通讯作者: 胡啸峰
  • 作者简介:刘春龙,中国人民公安大学,硕士研究生,研究方向为强化学习、社会公共安全风险评估。
    本文主要工作为完成文献调研和论文撰写。
    LIU Chunlong is a master’s student at the People’s Public Security University of China. His research interests include reinforcement learning and public security risk assessment.
    In this paper, he is responsible for literature review and paper writing.
    E-mail: 2228154580@qq.com|胡啸峰,博士,中国人民公安大学,副教授,主要研究方向风险评估与预测预警技术。
    本文主要承担工作为论文内容修改。
    HU Xiaofeng, Ph.D., is an associate professor at the People’s Public Security University of China. His main research interestsw include risk assessment and predictive early warning technology.
    In this paper, he is mainly responsible for revising the manuscript.
    E-mail: huxiaofeng@ppsuc.edu.cn
  • 基金资助:
    国家自然科学基金项目(72174203)

A Dynamic Scheduling Method for Police Resources Based on Bayesian Networks and Reinforcement Learning

LIU Chunlong1(),MA Qiuping1,WANG Runsheng2,HU Jinming1,3,HU Xiaofeng1,3,*()   

  1. 1. School of Information and Cyber Security, People’s Public Security University of China, Beijing 100038, China
    2. Senior Police Officer Academy MPS, Beijing 100045, China
    3. Key Laboratory of Security Prevention Technology and Risk Assessment, Ministry of Public Security, People’s Public Security University of China, Beijing 102623, China
  • Received:2025-06-22 Online:2026-02-20 Published:2026-02-02
  • Contact: HU Xiaofeng

摘要:

【目的】为解决传统警务资源固定分配模式无法及时应对区域犯罪风险动态变化以及多类型警务资源动态协同优化不足等问题,【方法】提出了一种基于贝叶斯网络与强化学习的警务资源动态调度方法。首先利用贝叶斯网络评估各区域犯罪风险,然后结合强化学习算法获得最优的警务资源调度策略。为验证该方法的有效性,以北方某大型城市某区为例,设计了五类不同的资源调度方案。【结果】实验结果表明,在强化学习模型中,DQN算法的训练性能最佳(奖励值为1,755.82),使用强化学习的方法较传统分配方式可降低6.68%的预期风险值。资源与风险的非线性拟合结果表明,资源投入在基准值的1.1至1.2倍区间内可获得最佳效益比。研究结果适用于城市治安领域的警务资源合理配置。

关键词: 贝叶斯网络, 强化学习, 警务资源调度, 犯罪风险评估, DQN算法, 非线性回归

Abstract:

[Objective] To address issues with traditional fixed police resource allocation models, which cannot promptly respond to dynamic changes in regional crime risks and lack dynamic synergy optimization across multiple types of police resources, [Methods] this paper proposes a dynamic police resource scheduling method based on Bayesian networks and reinforcement learning. The method first uses Bayesian networks to evaluate crime risks in different regions, then employs reinforcement learning algorithms to obtain optimal police resource allocation strategies. To verify the effectiveness of this method, five different resource allocation plans were designed using a district in a large northern city as a case study. [Results] Experimental results show that in the reinforcement learning model, the DQN algorithm achieved the best training performance (with a reward value of 1,755.82). The reinforcement learning method reduced the expected risk value by 6.68% compared to traditional allocation methods. Non-linear fitting results between resources and risk indicate that resource input within the range of 1.1 to 1.2 times the baseline value yields the optimal cost-benefit ratio. The research results are applicable to the rational allocation of policing resources in the field of urban public security.

Key words: bayesian networks, reinforcement learning, police resource allocation, crime risk assessment, DQN algorithm, nonlinear regression