A Deep Bidirectional LSTM Model Enhanced by Transfer-Learning-Based Feature Extraction for Dynamic Human Activity Recognition-Reference-Cited by-同舟云学术

A Deep Bidirectional LSTM Model Enhanced by Transfer-Learning-Based Feature Extraction for Dynamic Human Activity Recognition

Published:2024-01-10 Issue:2 Volume:14 Page:603
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Hassan Najmul¹^ORCID,Miah Abu Saleh Musa¹^ORCID,Shin Jungpil¹^ORCID

Affiliation:

1. School of Computer Science and Engineering, University of Aizu, Aizuwakamatsu 965-8580, Japan

Abstract

Dynamic human activity recognition (HAR) is a domain of study that is currently receiving considerable attention within the fields of computer vision and pattern recognition. The growing need for artificial-intelligence (AI)-driven systems to evaluate human behaviour and bolster security underscores the timeliness of this research. Despite the strides made by numerous researchers in developing dynamic HAR frameworks utilizing diverse pre-trained architectures for feature extraction and classification, persisting challenges include suboptimal performance accuracy and the computational intricacies inherent in existing systems. These challenges arise due to the vast video-based datasets and the inherent similarity in the data. To address these challenges, we propose an innovative, dynamic HAR technique employing a deep-learning-based, deep bidirectional long short-term memory (Deep BiLSTM) model facilitated by a pre-trained transfer-learning-based feature-extraction approach. Our approach begins with the utilization of Convolutional Neural Network (CNN) models, specifically MobileNetV2, for extracting deep-level features from video frames. Subsequently, these features are fed into an optimized deep bidirectional long short-term memory (Deep BiLSTM) network to discern dependencies and process data, enabling optimal predictions. During the testing phase, an iterative fine-tuning procedure is introduced to update the high parameters of the trained model, ensuring adaptability to varying scenarios. The proposed model’s efficacy was rigorously evaluated using three benchmark datasets, namely UCF11, UCF Sport, and JHMDB, achieving notable accuracies of 99.20%, 93.3%, and 76.30%, respectively. This high-performance accuracy substantiates the superiority of our proposed model, signaling a promising advancement in the domain of activity recognition.

Funder

The Competitive Research Fund of The University of Aizu, Japan

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/14/2/603/pdf

Reference72 articles.

1. Luo, S., Yang, H., Wang, C., Che, X., and Meinel, C. (2016, January 6–9). Action recognition in surveillance video using convents and motion history image. Proceedings of the International Conference on Artificial Neural Networks, Barcelona, Spain.

2. Egawa, R., Miah, A.S.M., Hirooka, K., Tomioka, Y., and Shin, J. (2023). Dynamic Fall Detection Using Graph-Based Spatial Temporal Convolution and Attention Network. Electronics, 12.

3. Liu, Y., Cui, J., Zhao, H., and Zha, H. (2012, January 11–15). Fusion of low-and high-dimensional approaches by trackers sampling for generic human motion tracking. Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), Tsukuba, Japan.

4. Action recognition in video sequences using deep Bi-directional LSTM with CNN features;Ullah;IEEE Access,2017

5. Activity recognition using temporal optical flow convolutional features and multi-layer LSTM;Ullah;IEEE Trans. Ind. Electron.,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dynamic Japanese Sign Language Recognition Throw Hand Pose Estimation Using Effective Feature Extraction and Classification Approach;Sensors;2024-01-26