A Dynamic Position Embedding-Based Model for Student Classroom Complete Meta-Action Recognition
Author:
Shou Zhaoyu12ORCID, Yuan Xiaohu1, Li Dongxu1, Mo Jianwen1ORCID, Zhang Huibing3, Zhang Jingwei4ORCID, Wu Ziyong4
Affiliation:
1. School of Information and Communication, Guilin University of Electronic Technology, Guilin 541004, China 2. Guangxi Wireless Broadband Communication and Signal Processing Key Laboratory, Guilin University of Electronic Technology, Guilin 541004, China 3. School of Computer and Information Security, Guilin University of Electronic Technology, Guilin 541004, China 4. Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China
Abstract
The precise recognition of entire classroom meta-actions is a crucial challenge for the tailored adaptive interpretation of student behavior, given the intricacy of these actions. This paper proposes a Dynamic Position Embedding-based Model for Student Classroom Complete Meta-Action Recognition (DPE-SAR) based on the Video Swin Transformer. The model utilizes a dynamic positional embedding technique to perform conditional positional encoding. Additionally, it incorporates a deep convolutional network to improve the parsing ability of the spatial structure of meta-actions. The full attention mechanism of ViT3D is used to extract the potential spatial features of actions and capture the global spatial–temporal information of meta-actions. The proposed model exhibits exceptional performance compared to baseline models in action recognition as observed in evaluations on public datasets and smart classroom meta-action recognition datasets. The experimental results confirm the superiority of the model in meta-action recognition.
Funder
National Natural Science Foundation of China Guangxi Natural Science Foundation Project of Guangxi Wireless Broadband Communication and Signal Processing Key Laboratory Innovation Project of Guangxi Graduate Education Project for Improving the Basic Scientific Research Abilities of Young and Middle-aged Teachers in Guangxi Colleges and Universities
Reference29 articles.
1. Shou, Z., Yan, M., Wen, H., Liu, J., Mo, J., and Zhang, H. (2023). Research on Students’ Action Behavior Recognition Method Based on Classroom Time-Series Images. Appl. Sci., 13. 2. Lin, F.C., Ngo, H.H., Dow, C.R., Lam, K.H., and Le, H.L. (2021). Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors, 21. 3. Chen, Z., Huang, W., Liu, H., Wang, Z., Wen, Y., and Wang, S. (2024). ST-TGR: Spatio-Temporal Representation Learning for Skeleton-Based Teaching Gesture Recognition. Sensors, 24. 4. Human action recognition using attention based LSTM network with dilated CNN features;Muhammad;Future Gener. Comput. Syst.,2021 5. Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S., and Hu, H. (2022, January 18–24). Video Swin Transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA.
|
|