面向骨架手势识别的全局时空可变形网络

Global Spatio-Temporal Deformable Network for Skeleton-Based Gesture Recognition

摘要: 基于骨架序列进行手势识别关键在于如何融合时空信息提取可分辨性强的特征。该文提出关键点聚焦模块，通过全局上下文建模和不受限于固定形式的卷积方式，网络可以跨越多帧和不相关的关键点，在全局范围内自适应地聚合与手势动作密切相关的关键点信息，提取手势的时空特征。实验表明该方法在ChaLearn2013和SHREC数据集上得到的准确率可以达到94.88%和95.23%，优于现有方法。此外，该方法在处理噪声数据和动态手势方面稳定性更好。

Abstract: The key of gesture recognition based on skeleton sequence is how to fuse spatio-temporal information and extract discriminate features. This paper proposes a key point focusing module. Through the global context modeling and the convolution method not limited to the fixed form, the network can span multiple frames and irrelevant key points, adaptively aggregate key point information closely related to gesture actions in the global scope, and extract the spatio-temporal characteristics of gesture. Experiments on Chalearn2013 and SHREC datasets show that the accuracy of our proposed method can reach 94.88% and 95.23%, and the method outperforms state-of-the-art methods. In addition, the method has better stability in dealing with noisy data and dynamic gestures.