A Normalization Strategy for Weakly Supervised 3D Hand Pose Estimation
-
Published:2024-04-24
Issue:9
Volume:14
Page:3578
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Guo Zizhao1, Li Jinkai2ORCID, Tan Jiyong3ORCID
Affiliation:
1. College of Computer Science, Chengdu University, Chengdu 610106, China 2. College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China 3. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 610056, China
Abstract
The effectiveness of deep neural network models is intricately tied to the distribution of training data. However, in pose estimation, potential discrepancies in root joint positions and inherent variability in biomechanical features across datasets are often overlooked in current training strategies. To address these challenges, a novel Hand Pose Biomechanical Model (HPBM) is developed. In contrast to the traditional 3D coordinate-encoded pose, it provides a more intuitive depiction of the anatomical characteristics of the hand. Through this model, a data normalization approach is implemented to align the root joint and unify the biomechanical features of training samples. Furthermore, the HPBM facilitates a weakly supervised strategy for dataset expansion, significantly enhancing the data diversity. The proposed normalized method is evaluated on two widely used 3D hand pose estimation datasets, RHD and STB, demonstrating superior performance compared to the models trained without normalized datasets. Utilizing ground truth 2D keypoints as input, a reduction of 45.1% and 43.4% in error is achieved on the STB and RHD datasets, respectively. When leveraging 2D keypoints from MediaPipe, a reduction in error by 11.3% and 14.3% is observed on the STB and RHD datasets.
Funder
National Key Research and Development Program of China National Natural Science Foundation of China under
Reference47 articles.
1. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11–17). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Online. 2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv. 3. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., and Thalmann, N.M. (November, January 27). Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea. 4. Jiang, C., Xiao, Y., Wu, C., Zhang, M., Zheng, J., Cao, Z., and Zhou, J.T. (2023). A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image. arXiv. 5. Karunratanakul, K., Prokudin, S., Hilliges, O., and Tang, S. (2022). HARP: Personalized Hand Reconstruction from a Monocular RGB Video. arXiv.
|
|