Abstract:
For hand pose estimation, one immediate problem is to reduce the need for labeled data which is difficult to provide in desired quantity, realism and accuracy. To meet this need, a novel multi-view projection based semi-supervised learning algorithm is proposed. Firstly, 3D hand points are extracted from a single depth image without label and projected onto three orthogonal planes. Secondly, an encoder-decoder model is applied to learn the latent representation of two projections. Finally, small amount of labeled data is used to learn a mapping from latent representation to hand joint coordinates. The propose algorithm is evaluated on NYU hand pose estimation dataset, and the experimental results demonstrate the effectiveness and advantages of our proposed algorithm.