STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video-Reference-Cited by-同舟云学术

STA-TSN: Spatial-Temporal Attention Temporal Segment Network for action recognition in video

Published:2022-03-17 Issue:3 Volume:17 Page:e0265115
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Yang Guoan^ORCID,Yang Yong,Lu Zhengzhi,Yang Junjie,Liu Deyang,Zhou Chuanbo,Fan Zien

Abstract

Most deep learning-based action recognition models focus only on short-term motions, so the model often causes misjudgments of actions that are combined by multiple processes, such as long jump, high jump, etc. The proposal of Temporal Segment Networks (TSN) enables the network to capture long-term information in the video, but ignores that some unrelated frames or areas in the video can also cause great interference to action recognition. To solve this problem, a soft attention mechanism is introduced in TSN and a Spatial-Temporal Attention Temporal Segment Networks (STA-TSN), which retains the ability to capture long-term information and enables the network to adaptively focus on key features in space and time, is proposed. First, a multi-scale spatial focus feature enhancement strategy is proposed to fuse original convolution features with multi-scale spatial focus features obtained through a soft attention mechanism with spatial pyramid pooling. Second, a deep learning-based key frames exploration module, which utilizes a soft attention mechanism based on Long-Short Term Memory (LSTM) to adaptively learn temporal attention weights, is designed. Third, a temporal-attention regularization is developed to guide our STA-TSN to better realize the exploration of key frames. Finally, the experimental results show that our proposed STA-TSN outperforms TSN in the four public datasets UCF101, HMDB51, JHMDB and THUMOS14, as well as achieves state-of-the-art results.

Funder

National Natural Science Foundation of China

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference50 articles.

1. A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions;SK Yadav;Knowledge-Based Systems,2021

2. A smartphone sensors-based personalized human activity recognition system for sustainable smart cities;AR Javed;Sustainable Cities and Society,2021

3. Parciv: recognizing physical activities having complex interclass variations using semantic data of smartphone;M Usman Sarwar;Software: Practice and Experience,2021

4. A survey on video-based human action recognition: recent updates, datasets, challenges, and applications;P Pareek;Artificial Intelligence Review,2021

5. Illumination and scale invariant relevant visual features with hypergraph-based learning for multi-shot person re-identification;A Nanda;Multimedia Tools and Applications,2019

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Volleyball training video classification description using the BiLSTM fusion attention mechanism;Heliyon;2024-08

2. Spatial-temporal multiscale feature optimization based two-stream convolutional neural network for action recognition;Cluster Computing;2024-06-01

3. Spatio‐temporal attention modules in orientation‐magnitude‐response guided multi‐stream CNNs for human action recognition;IET Image Processing;2024-04-22

4. MHAiR: A Dataset of Audio-Image Representations for Multimodal Human Actions;Data;2024-01-25

5. Multimodal fusion for audio-image and video action recognition;Neural Computing and Applications;2024-01-09