Frontiers of Data and Computing ›› 2024, Vol. 6 ›› Issue (5): 66-79.

CSTR: 32002.14.jfdc.CN10-1649/TP.2024.05.007

doi: 10.11871/jfdc.issn.2096-742X.2024.05.007

Previous Articles     Next Articles

Research on Video Behavior Recognition Method with Active Perception Mechanism

YAN Zhiyu1(),RU Yiwei2,SUN Fupeng3,SUN Zhenan2,*()   

  1. 1. School of Computer Science, Beijing Institute of Technology, Beijing 102488, China
    2. State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    3. School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 102488, China
  • Received:2023-08-09 Online:2024-10-20 Published:2024-10-21

Abstract:

[Purpose]In the field of video behavior recognition, how to effectively focus on important regions in video frames and make full use of spatiotemporal information is a significant research issue. [Methods] This paper proposes an Active Perception Mechanism (APM) that actively perceives crucial regions in videos. Specifically, the method employs a novel network model based on a spatiotemporal multi-scale attention mechanism, which establishes a “scrutinizing-browsing” network. The scrutinizing branch and browsing branch each embeds Multiscale Vision Transformer structures, enabling the model to equip self-attention initiative in perceiving important regions and spatiotemporal multi-scale initiative in each stage of data processing. To maintain the consistency of inter-frame information while obtaining augmented data to improve robustness, we introduce a multi-dual-random data augmentation method to realize sample amplification and data enhancement. [Results] On the large-scale human behavior recognition benchmarks of Kinetics-400 and Kinetics-600 datasets, our designed method achieves competitive results.

Key words: action recognition, self-attention mechanism, deep learning, video, Transformer