A Normalization Strategy for Weakly Supervised 3D Hand Pose Estimation-Reference-Cited by-同舟云学术

A Normalization Strategy for Weakly Supervised 3D Hand Pose Estimation

Published:2024-04-24 Issue:9 Volume:14 Page:3578
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Guo Zizhao¹,Li Jinkai²^ORCID,Tan Jiyong³^ORCID

Affiliation:

1. College of Computer Science, Chengdu University, Chengdu 610106, China

2. College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

3. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen 610056, China

Abstract

The effectiveness of deep neural network models is intricately tied to the distribution of training data. However, in pose estimation, potential discrepancies in root joint positions and inherent variability in biomechanical features across datasets are often overlooked in current training strategies. To address these challenges, a novel Hand Pose Biomechanical Model (HPBM) is developed. In contrast to the traditional 3D coordinate-encoded pose, it provides a more intuitive depiction of the anatomical characteristics of the hand. Through this model, a data normalization approach is implemented to align the root joint and unify the biomechanical features of training samples. Furthermore, the HPBM facilitates a weakly supervised strategy for dataset expansion, significantly enhancing the data diversity. The proposed normalized method is evaluated on two widely used 3D hand pose estimation datasets, RHD and STB, demonstrating superior performance compared to the models trained without normalized datasets. Utilizing ground truth 2D keypoints as input, a reduction of 45.1% and 43.4% in error is achieved on the STB and RHD datasets, respectively. When leveraging 2D keypoints from MediaPipe, a reduction in error by 11.3% and 14.3% is observed on the STB and RHD datasets.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China under

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/9/3578/pdf

Reference47 articles.

1. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B. (2021, January 11–17). Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF International Conference on Computer Vision, Online.

2. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv.

3. Cai, Y., Ge, L., Liu, J., Cai, J., Cham, T.J., Yuan, J., and Thalmann, N.M. (November, January 27). Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea.

4. Jiang, C., Xiao, Y., Wu, C., Zhang, M., Zheng, J., Cao, Z., and Zhou, J.T. (2023). A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image. arXiv.

5. Karunratanakul, K., Prokudin, S., Hilliges, O., and Tang, S. (2022). HARP: Personalized Hand Reconstruction from a Monocular RGB Video. arXiv.