Intelligent Scoring Method for Police Training Action Based on Multi-Stream Graph-Temporal Fusion Network

doi:10.11871/jfdc.issn.2096-742X.2026.03.015

Abstract

Abstract:

[Purpose] An intelligent scoring method training action based on a Multi-Stream Graph-Temporal Fusion Network (MS-GTFN) is proposed to address the issues of subjective evaluation, low efficiency, insufficient standardization, and difficulty in quantifying action quality in police training assessments that rely on manual scoring. This method provides a reliable quality evaluation reference for police officers to conduct independent training. [Methods] First, four types of spatiotemporal feature streams—joint stream, bone stream, and their corresponding motion streams—are constructed to comprehensively encode both structural and dynamic characteristics of movements. Second, Graph Convolutional Networks (GCNs) and Temporal Convolutional Networks (TCNs) are employed in parallel to extract spatiotemporal fusion features of actions. Subsequently, channel attention (CA) and spatial self-attention (SSA) modules are introduced to further enhance the model’s capability to focus on key features. Finally, a multi-layer perceptron (MLP) is used to predict action score. [Results] The proposed model is trained and validated on our self-built police training dataset. Experimental results demonstrate promising performance in motion action score prediction tasks, achieving an MAE of 0.4553, an MSE of 0.3507, and an R2 of 0.9306. [Conclusions] The method offers significant advantages in the fusion of training action features and scoring accuracy, providing more reliable support for intelligent scoring of police training action quality.

Key words: graph convolutional networks, temporal convolutional networks, multi-stream fusion, attention module, police force training, scoring of action quality

ZHANG Peijing,YAN Jiaxin,WANG Xiaoxuan,Li Junjie,ZENG Yunfei. Intelligent Scoring Method for Police Training Action Based on Multi-Stream Graph-Temporal Fusion Network[J]. Frontiers of Data and Computing, 2026, 8(3): 181-190.

Figures/Tables 8

Table 1

Fig.1

Fig.2

Fig.3

Table 2

Table 3

Table 4

Fig.4

References 16

[1]	YING Z, HONG B, CHAO M, et al. A method of traffic police detection based on attention mechanism in natural scene[J]. Neurocomputing, 2021, 458: 592-601. doi: 10.1016/j.neucom.2019.12.144
[2]	杨润宇. 基于深度学习的视频动作相似度模型的研究与实现[D]. 北京: 北京邮电大学, 2020.
[3]	MA Y, SONG Z, ZHUANG Y, et al. A survey on vision-language-action models for embodied ai[J]. ArXiv preprint arXiv:2405.14093, 2024.
[4]	KIPF T, WELLING M. Semi-supervised classiﬁcation with graph convolutional networks[J]. ArXiv preprint arXiv:1609.02907, 2016.
[5]	YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for skeleton-based action recognition[J]. Proc AAAI Conf Artif Intell, 2018, 32(1):7444-7452.
[6]	WANG H and ZHANG Z. TATCN: Time series prediction model based on time attention mechanism and TCN[C]. 2022 IEEE 2nd International Conference on Computer Communication and Artificial Intelligence (CCAI), Beijing, China, 2022: 26-31.
[7]	SHI L, ZHANG Y, CHENG J, et al. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition[C]. 2019 IEEE //Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 2019: 12026-12035.
[8]	FU Z, CHEN JJ, JIANG K, et al. Traffic Police 3D Gesture Recognition Based on Spatial-Temporal Fully Adaptive Graph Convolutional Network[J]. IEEE Transactions on Intelligent Transportation Systems, 2023: 9518-9531.
[9]	伍锡如, 陈麒. 基于改进OpenPose网络的交通警察姿态估计[J]. 计算机应用与软件, 2025, 42(1): 90-95.
[10]	蔡兴泉, 霍宇晴, 李发建, 等. 面向太极拳学习的人体姿态估计及相似度计算[J]. 图学学报, 2022, 43(4): 695-706.
[11]	吴伟美, 胡建华, 魏嘉俊. 基于姿态识别的太极拳动作评分系统[J]. 信息技术与信息化, 2021(11): 231-233.
[12]	GAETANO D, STEFANOS S, MARCO P, et al. Comparing human pose estimation through deep learning approaches: An overview[J]. Computer Vision and Image Understanding, 2025: 104297.
[13]	LIU JJ, LIU MY, LIU H, et al. TCPFormer: Learning temporal correlation with implicit pose proxy for 3d human pose estimation[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2025, (39)5: 5478-5486.
[14]	HASSANIN M, KHAMIS A, BENNAMOUN M, et al. Crossformer3D: cross spatio-temporal transformer for 3D human pose estimation[J]. Signal, Image and Video Processing, 2025, 19(8): 618. doi: 10.1007/s11760-025-04145-0
[15]	KALMAN R E. A new approach to linear filtering and prediction problems[J]. Journal of Basic Engineering, 1960, 82D: 35-45.
[16]	ZHA K, CAO P, SON J, et al. Rank-N-Contrast: Learning continuous representations for regression[C] // Advances in Neural Information Processing Systems, 2023: 17882-17903.

动作编号	动作名称	器械类型
G01	上开棍	警棍
G02	下开棍	警棍
G03	紧急开棍	警棍
G04	上劈击	警棍
G05	下劈击	警棍
G06	警棍推击	警棍
G07	提手戒备	催泪喷
G08	扶械戒备	催泪喷
G09	催泪喷射	催泪喷
G10	上方撞击	盾牌
G11	左侧撞击	盾牌
G12	右侧撞击	盾牌

模型类型	MAE	MSE	R²
J-GTFN	0.5437	0.5375	0.8551
B-GTFN	0.6142	0.5542	0.8325
JM-GTFN	0.5012	0.5046	0.9176
BK-GTFN	0.5183	0.539	0.8859
MS-GTFN	0.4553	0.3507	0.9306

招式	MAE	MSE	R²
G01	0.3913	0.2568	0.966
G02	0.4018	0.3195	0.9031
G03	0.4582	0.3293	0.9499
G04	0.4959	0.4225	0.9237
G05	0.4839	0.3687	0.9464
G06	0.4943	0.447	0.9486
G07	0.3071	0.1496	0.9831
G08	0.3492	0.2147	0.9515
G09	0.5468	0.4883	0.9336
G10	0.5064	0.4168	0.8927
G11	0.5072	0.4292	0.8656
G12	0.522	0.5527	0.8482