数据与计算发展前沿 ›› 2025, Vol. 7 ›› Issue (2): 109-119.

CSTR: 32002.14.jfdc.CN10-1649/TP.2025.02.011

doi: 10.11871/jfdc.issn.2096-742X.2025.02.011

• 技术与应用 • 上一篇    下一篇

基于注意力和相对平均区分器的视频行为识别模型

王岐军1(),刘廷龙2,*()   

  1. 1.大连港通信工程有限公司,辽宁 大连 116001
    2.大连工业大学 信息技术中心,辽宁 大连 116034
  • 收稿日期:2024-09-17 出版日期:2025-04-20 发布日期:2025-04-23
  • 通讯作者: 刘廷龙
  • 作者简介:王岐军,研究生,高级工程师,国际项目经理(IPMA),硕士生导师,大连港通信工程有限公司总经理。长期从事港口软件、安全软件产品研发建设,主要研究方向是智能信息系统,智能信息处理。
    负责论文初稿撰写和实验开发。
    WANG Qijun, Master’s degree, Senior Engineer, International Project Manager (IPMA), Master tutor, is the General Manager of Dalian Port Communication Engineering Co., LTD. He has long been engaged in the development and construction of port software and security software products. His main research interests include intelligent information system and intelligent information processing.
    In this paper, he is responsible for the paper drafting and experiment development.
    E-mail: qjwang@cmhk.com|刘廷龙,大连工业大学信息技术中心,实验教师,长期从事人工智能,信息系统开发建设,主要研究方向是计算机视觉,智能信息系统。
    负责制定论文框架和理论分析验证。
    LIU Tinglong, is an experimental teacher at the Center for Information Technology of Dalian Polytechnic University,. He has long been engaged in artificial intelligence, information system development and construction. His main research interests include intelligent information system and intelligent information processing.
    In this paper, he is responsible for formulating paper framework and theoretical analysis and verification.
    E-mail: liutl@dlpu.edu.cn
  • 基金资助:
    辽宁省科技计划项目(2020JH2/10100032)

Video Action Recognition Model Based on Attention and Relative Average Discriminator

WANG Qijun1(),LIU Tinglong2,*()   

  1. 1. Dalian Port Communication Engineering Co., LTD., Dalian, Liaoning 116001, China
    2. Center for Information Technology, Dalian Polytechnic University, Dalian, Liaoning 116034, China
  • Received:2024-09-17 Online:2025-04-20 Published:2025-04-23
  • Contact: LIU Tinglong

摘要:

【目的】在视频行为识别任务中,如何获取更多的行为特征是研究的重点内容。视频行为特征包括时序特征和空间特征。如果模型无法抽取足够的时空特征,将严重影响视频行为识别结果。【方法】超分辨率技术主要作用是将低分辨率图片转换为高分辨率图片。已经有研究者将超级分辨率技术应用到了低分辨率视频行为识别领域,但方法主要从视频的整体出发进行视频帧图片分辨率的提升,对行为本身的特征关注度不足。为解决这个问题,本文提出了一种基于注意力机制和相对平均区分器的超级分辨率(ARADSP)网络模型方法。【结果】通过注意力机制关注视频中具有更多时空信息的行为特征,通过相对平均的区分器来提升数据质量和稳定性。最后在数据集HMDB51、UCF101和Something-Something V1&V2上进行实验,大量的实验结果表明本文的方法在视频行为识别中的有效性。

关键词: 注意力机制, 超级分辨率, 对抗生成网络, 相对平均区分器, 视频行为识别

Abstract:

[Objective] In the task of video action recognition, how to obtain more action features is a key research issue. Video action features include temporal and spatial features. If the model cannot extract enough spatiotemporal features, it will seriously affect the results of video action recognition. [Methods] Super-resolution technology mainly serves to convert low-resolution images into high-resolution images; some researchers have applied super-resolution technology to the field of low-resolution video action recognition. However, these methods mainly focus on enhancing the resolution of video frame images from the overall perspective of the video, paying insufficient attention to the features of the action itself. To address this issue, this paper proposes a super-resolution network model based on attention mechanisms and relative average discriminators (ARADSP). [Results] The proposed method focuses on action features in videos with more spatiotemporal information through attention mechanisms and improves data quality and stability through relative average discriminators. Finally, experiments are conducted on HMDB51, UCF101, and Something-Something V1&V2 datasets. Extensive experimental results demonstrate the effectiveness of the proposed method in video action recognition.

Key words: attention, supper-resolution, generative adversarial network, realistic average discriminator, low-resolution action recognition