Spatio‐temporal attention modules in orientation‐magnitude‐response guided multi‐stream CNNs for human action recognition-Reference-Cited by-同舟云学术

Spatio‐temporal attention modules in orientation‐magnitude‐response guided multi‐stream CNNs for human action recognition

Published:2024-04-22 Issue:9 Volume:18 Page:2372-2388
ISSN:1751-9659
Container-title:IET Image Processing
language:en
Short-container-title:IET Image Processing

Author:

Khezerlou Fatemeh¹^ORCID,Baradarani Aryaz²^ORCID,Balafar Mohammad Ali¹,Maev Roman Gr.²³

Affiliation:

1. Faculty of Electrical and Computer Engineering University of Tabriz Tabriz Iran

2. Center for Diagnostic Imaging Research Tessonics Inc Windsor Ontario Canada

3. Institute for Diagnostic Imaging Research University of Windsor Windsor Ontario Canada

Abstract

AbstractThis paper introduces a new descriptor called orientation‐magnitude response maps as a single 2D image to effectively explore motion patterns. Moreover, boosted multi‐stream CNN‐based model with various attention modules is designed for human action recognition. The model incorporates a convolutional self‐attention autoencoder to represent compressed and high‐level motion features. Sequential convolutional self‐attention modules are used to exploit the implicit relationships within motion patterns. Furthermore, 2D discrete wavelet transform is employed to decompose RGB frames into discriminative coefficients, providing supplementary spatial information related to the actors actions. A spatial attention block, implemented through the weighted inception module in a CNN‐based structure, is designed to weigh the multi‐scale neighbours of various image patches. Moreover, local and global body pose features are combined by extracting informative joints based on geometry features and joint trajectories in 3D space. To provide the importance of specific channels in pose descriptors, a multi‐scale channel attention module is proposed. For each data modality, a boosted CNN‐based model is designed, and the action predictions from different streams are seamlessly integrated. The effectiveness of the proposed model is evaluated across multiple datasets, including HMDB51, UTD‐MHAD, and MSR‐daily activity, showcasing its potential in the field of action recognition.

Publisher

Institution of Engineering and Technology (IET)

Reference94 articles.

1. Multi-stream CNN: Learning representations based on human-related regions for action recognition

2. Joint spatial-temporal attention for action recognition

3. Caetano C. Sena J. Bremond F. Dos Santos J.A. Schwartz W.R.:Skelemotion: A new representation of skeleton joint sequences based on motion information for 3D action recognition. In:16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1–8.IEEE Piscataway NJ(2019)

4. Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences

5. Depth Sequential Information Entropy Maps and Multi-Label Subspace Learning for Human Action Recognition

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Recognizing human activities with the use of Convolutional Block Attention Module;Egyptian Informatics Journal;2024-09