基于多视图投影的半监督手姿态估计算法

Hand Pose Estimation through Semi-Supervised Learning with Multi-View Projection

摘要: 为解决手姿态估计中标签数据的获取困难问题，该文提出了一种基于多视图投影的半监督学习方法，减少对标记数据的需求。首先，从单张深度图中分割出手部区域，将其投影至3个正交平面；而后，采用编解码模型学习两个投影视图在低维度隐空间中的关联表征；最终，结合标记数据，学习低维度隐空间表征到手姿态三维坐标的回归映射。实验表明，该方法减少了对标记数据的依赖，在NYU手姿态估计数据库上获得了较好的结果。

Abstract: For hand pose estimation, one immediate problem is to reduce the need for labeled data which is difficult to provide in desired quantity, realism and accuracy. To meet this need, a novel multi-view projection based semi-supervised learning algorithm is proposed. Firstly, 3D hand points are extracted from a single depth image without label and projected onto three orthogonal planes. Secondly, an encoder-decoder model is applied to learn the latent representation of two projections. Finally, small amount of labeled data is used to learn a mapping from latent representation to hand joint coordinates. The propose algorithm is evaluated on NYU hand pose estimation dataset, and the experimental results demonstrate the effectiveness and advantages of our proposed algorithm.