Data Encoding and CNN Accurate Recognition of Human Body Motion

HU Qing-song; ZHANG Liang; DING Juan; LI Shi-yin

doi:10.12178/1001-0548.2019108

It is affected by many negative factors to recognize human motions accurately, such as the affecting of the light intensity during motion data collecting, the vagueness and transformation of the human motion features. To decrease the effect of these adverse factors and improve the accuracy of motion recognition, the following contents are investigated in this paper. Firstly, the human joint data collected by Kinect are preprocessed to overcome the illumination problem. Secondly, encoding methods are proposed to encode the preprocessed data, and then the encoded data are inputted into CNN to extract human motion features automatically, making the description of motion feature easier. Finally, the CNN completes motion classification with SoftMax. These experiments show the proposed algorithm can achieve a high recognition accuracy (most of the F₁ values are larger than 0.8) and can adapt to different data settings; the compound property data are better than single property data in single property tests and the F₁ value can be 0.916; the F₁ values in compound property tests are smaller than those of single property tests and the maximal decrease percentage can be 25%.

HTML

人体动作识别^[1]旨在通过在线或离线的方式，从传感器所采集的数据中自动识别出人体正在执行的动作，它是计算机视觉^[2]、机器学习^[3]、模式识别^[4]和人工智能^[5]等技术交叉融合发展的结果，在新型人机交互、虚拟现实、增强现实和辅助培训等领域具有广阔的应用前景。

人体动作识别算法主要有模板匹配和机器学习两大类。模板匹配算法将动作实例与模板库中的动作进行对比，模板库中与动作实例相似度最高的动作即为识别结果；文献[6]提出的时间自相似和动态规整的识别算法即属此类。模板匹配识别算法的缺点是随着模板库的增加，模板比对的开销会越来越大。机器学习算法是用一系列动作实例训练一个分类器，此分类器能够区分不同动作的共性与差异，进而利用分类器进行动作分类；文献[7]提出的方法即属此类，它结合时空特征与3D-SIFT描述子提取人体特征，使用SVM(support vector machine)算法识别人体动作。与基于模板匹配的识别算法类似，基于机器学习的识别算法也要求手动提取动作特征，因此操作繁琐，难以通用。

与普通的机器学习不同，深度学习算法^[8-10]能够自动提取目标特征，其中CNN在深度学习中应用得尤为广泛^[11]，在很多情况下都表现出优秀的性能，适合处理人体动作识别等复杂任务。为此，本文提出基于CNN的精确人体动作识别算法，对人体骨骼数据进行编码处理，进而构建深度学习框架，从而准确识别数据所包含的人体动作。

4. 结束语

人体动作识别在新型人机交互等众多领域具有广阔的应用前景。本文提出了一种基于卷积神经网络的精确人体动作识别算法，用以对Kinect等传感器采集的动作数据集进行训练和识别。为了使用CNN框架处理动作数据，本文设计了平铺编码、回形编码以及“之”字形编码，这些编码方法可将动作数据编码成灰度图像或彩色图像。实验表明，本文所提出的人体动作识别方法具有较高的识别率和较强的泛化能力。接下来，将把本文算法应用于工程领域，用以判断特殊岗位的工作人员(比如煤矿井下的检修工人或矸石挑选工人)是否执行了违规操作，从而降低生产事故的发生率。

Reference (16)

[1]	JI S W, XU W, YANG M. 3D convolutional neural networks for human action recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(1): 221-231.
[2]	ZHANG H B, ZHANG Y X, ZHONG B N, et al. A comprehensive survey of vision-based human action recognition methods[J]. Sensors, 2019, 19(1005): 1-20.
[3]	MURPHY K P. Machine learning: A probabilistic perspective[M]. Cambridge: The MIT Press, 2012.
[4]	WENG J W, WENG C Q, YUAN J S, et al. Discriminative spatio-temporal pattern discovery for 3D action recognition[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 29(4): 1077-1089.
[5]	RUSSELL S, NORVIG P. Artificial intelligence: A modern approach[M]. The 3rd edition. Chennai: Pearson Education India, 2015.
[6]	WANG J, ZHENG H C. View-robust action recognition based on temporal self-similarities and dynamic time warping[C]//In Proceeding of the 2012 IEEE International Conference on Computer Science & Automation Engineering (CSAE). Zhangjiajie: IEEE, 2012: 498-502.
[7]	陈胜娣, 何冰倩, 陈思宇, 等. 基于时空兴趣点的人体动作识别[J]. 成都信息工程大学学报, 2018, 33(2): 143-148.	CEHN Sheng-ti, HE Bing-qian, CHEN Si-yu, et al. Human action recognition based on Spatio-Temporal interest point[J]. Journal of Chendu University of Information Technology, 2018, 33(2): 143-148.
[8]	DENG L, YU D. Deep learning: Methods and applications[M]. Boston: Now Publishers Inc, 2014.
[9]	张亮. 基于Kinect的人体动作识别算法研究与系统设计[D]. 徐州: 中国矿业大学, 2019.	ZHANG Liang. Algorithm research and system design of human motion recognition based on Kinect[D]. Xuzhou: China University of Mining and Technology, 2019.
[10]	SCHMIDHUBER J. Deep learning in neural networks: An overview[J]. Neural Networks, 2015, 61: 85-117.
[11]	龚丁禧. 稀疏自组合时空卷积神经网络动作识别方法及其并行化[D]. 厦门: 厦门大学, 2014.	GONG Ding-xi. Action recognition method based on sparse auto-combination Spatio-Temporal convolutional neural network and its MapReduce implementation[J]. Xiamen: Xiamen University, 2014.
[12]	FOTHERGILL S, MENTIS H M, KOHLI P, et al. Instructing people for training gestural interactive systems[C]//In Proceeding of The 30th ACM Conference on Human Factors in Computing Systems. Austin: ACM, 2012: 1737-1746.
[13]	JANG E, GU S X, POOLE B. Categorical reparameterization with gumbel-softmax[C]//In proceeding of ICLR 2017: 5th International Conference on Learning Representations. Toulon: International Conference on Learning Representations, 2017.
[14]	LITJENS G, SÁNCHEZ C I, TIMOFEEVA N, et al. Deep learning as a tool for increased accuracy and efficiency of histopathological diagnosis[J]. Scientific Reports, 2016, 6: 26286.
[15]	刁俊方. 基于Kinect的人体动作识别技术研究[D]. 重庆: 重庆大学, 2015.	DIAO Jun-fang. Human action recognition research based on Kinect[D]. Chongqing: Chongqing University, 2015.
[16]	LI W Q, ZHANG Z Y, LIU Z C. Action recognition based on a bag of 3D points[C]//In proceeding of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. San Francisco: IEEE, 2010: 25-30.

Data Encoding and CNN Accurate Recognition of Human Body Motion

doi: 10.12178/1001-0548.2019108

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views