Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer-Reference-Cited by-同舟云学术

Video action recognition collaborative learning with dynamics via PSO-ConvNet Transformer

Published:2023-09-05 Issue:1 Volume:13 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Nguyen Huu Phong,Ribeiro Bernardete

Abstract

AbstractRecognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, which confirms the effectiveness of our proposed method. In addition, we conducted experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 and obtained preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning). Overall, our dynamic PSO-ConvNet model provides a promising direction for improving HAR by better capturing the spatio-temporal dynamics of human actions in videos. The code is available at https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer.

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Link

https://www.nature.com/articles/s41598-023-39744-9.pdf

Reference81 articles.

1. Sultani, W., Chen, C. & Shah, M. Real-world anomaly detection in surveillance videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 6479–6488 (2018).

2. Li, A. et al. Abnormal event detection in surveillance videos based on low-rank and compact coefficient dictionary learning. Pattern Recognit. 108, 107355 (2020).

3. Razali, H., Mordan, T. & Alahi, A. Pedestrian intention prediction: A convolutional bottom-up multi-task approach. Transp. Res. Part C Emerg. Technol. 130, 103259 (2021).

4. Yang, H., Liu, L., Min, W., Yang, X. & Xiong, X. Driver yawning detection based on subtle facial action recognition. IEEE Trans. Multimed. 23, 572–583 (2020).

5. Presti, L. L. & La Cascia, M. 3d skeleton-based human action classification: A survey. Pattern Recognit. 53, 130–147 (2016).

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human action recognition using an optical flow-gated recurrent neural network;International Journal of Multimedia Information Retrieval;2024-07-16

2. Deep Learning Innovations in Video Classification: A Survey on Techniques and Dataset Evaluations;Electronics;2024-07-11

3. Weighted voting ensemble of hybrid CNN-LSTM Models for vision-based human activity recognition;Multimedia Tools and Applications;2024-06-08

4. Human action recognition with transformer based on convolutional features;Intelligent Decision Technologies;2024-06-07

5. Utilizing Gyroscope Data for Classifying Types of Fencer Movements in an Assistive Coaching System;2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE);2024-02-29