Abstract:
The key of gesture recognition based on skeleton sequence is how to fuse spatio-temporal information and extract discriminate features. This paper proposes a key point focusing module. Through the global context modeling and the convolution method not limited to the fixed form, the network can span multiple frames and irrelevant key points, adaptively aggregate key point information closely related to gesture actions in the global scope, and extract the spatio-temporal characteristics of gesture. Experiments on Chalearn2013 and SHREC datasets show that the accuracy of our proposed method can reach 94.88% and 95.23%, and the method outperforms state-of-the-art methods. In addition, the method has better stability in dealing with noisy data and dynamic gestures.