基于动作子空间和权重条件随机场的行为识别王智文

王智文; 蒋联源; 王宇航; 欧阳浩; 张灿龙; 黄镇谨; 王鹏涛

doi:10.3969/j.issn.1001-0548.2017.02.016

基于动作子空间和权重条件随机场的行为识别王智文

doi: 10.3969/j.issn.1001-0548.2017.02.016

1.
广西科技大学计算机科学与通信工程学院广西柳州 545006
2.
广西信息科学实验中心广西桂林 541004
3.
桂林航天工业学院汽车与交通工程学院广西桂林 541004
4.
广西师范大学计算机科学与信息工程学院广西桂林 541004
5.
广西科技大学电气与信息工程学院广西柳州 545006

基金项目:

国家自然科学基金 61462008

国家自然科学基金 61365009

广西省自然科学基金 2013GXNSFAA019336

广西省自然科学基金 2014GXNSFAA118368

广西省自然科学基金 2016GXNSFBA380081

详细信息

作者简介:
王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

中图分类号: TP393

Behavior Recognition Based on Action Subspace and Weight Condition Random Field

1.
College of Computer Science and Communication Engineering, Guangxi University of Science and Technology Liuzhou Guangxi 545006
2.
Guangxi Experiment Center of Information Science Guilin Guangxi 541004
3.
Institute of Automobile and Traffic Engineering, Guilin University of Aerospace Technology Guilin Guangxi 541004
4.
School of Computer Science & Information Technology, Guangxi Normal University Guilin Guangxi 541004
5.
College of Electrical and Information Engineering, Guangxi University of Science and Technology Liuzhou Guangxi 545006

摘要: 针对单目视频中的人类行为识别，提出了基于动作子空间与权重条件随机场的行为识别方法。该方法结合了基于特征提取的核主分量分析 (KPCA) 与基于运动建模的权重条件随机场 (WCRF) 模型。探讨了通过非线性降维行为空间的基本结构，并在运动轨迹投影过程中保留清晰的时间顺序，使人体轮廓数据表示更紧凑。WCRF通过多种交互途径对时间序列建模，从而提高了信息共享的联合精确度，具有超越生成模型的优势 (如放宽观察之间独立性的假设，有效地将重叠的特征和远距离依存关系合并起来的能力)。实验结果表明，该行为识别方法不仅能够准确地识别随时间、区域内外人员变化的人类行为，而且对噪声和其他因素鲁棒性强。
- 人类行为识别 /
- 人体轮廓提取与表示 /
- 核主分量分析 /
- 非线性降维 /
- 权重条件随机场
Abstract: For human behavior recognition in monocular video, a method for recognizing human behavior based on action subspace and weighted condition random field is presented in this paper. This method combines kernel principal component analysis (KPCA) based on feature extraction and weighted conditional random field (WCRF) based on activity modeling. Silhouette data of human is represented more compactly by nonlinear dimensionality reduction that explores the basic structure of action space and preserves explicit temporal orders in the course of projection trajectories of motions. Temporal sequences are modeled in WCRF by using multiple interacting ways, thus increasing joint accuracy by information sharing, and this model has superiority over generative ones (e.g., relaxing independence assumption between observations and the ability to effectively incorporate both overlapping features and long-range dependencies). The experimental results show that the proposed behavior recognition method can not only accurately recognize human activities with temporal, external and internal person variations, but also considerably robust to noise and other factors.

图 1 行为识别的框图

下载: 全尺寸图片幻灯片

图 2 行走的人体轮廓序列和块特征表示图

下载: 全尺寸图片幻灯片

图 3 PCA和KPCA方法降维效果

下载: 全尺寸图片幻灯片

图 4 线性链CRF

下载: 全尺寸图片幻灯片

图 5 两链之间WCRF

下载: 全尺寸图片幻灯片

图 6 行为数据集实例图像

下载: 全尺寸图片幻灯片

图 7 不同噪声和不同噪声密度下的行为识别精度

下载: 全尺寸图片幻灯片

表 1 人类行为识别使用的特征维数及训练、测试集大小比较

行为库	本文行为识别			文献[13]中的行为识别
行为库	使用特征维数	训练集大小	测试集大小	使用特征维数	训练集大小	测试集大小
WEI	226	1 023	113	506	4 092	2 156
KTH	128	8 762	975	480	10 742	9 868
a2a	59	1 023	9 085	123	2265	30 296
epsilon	287	56 960	78 632	2 000	400 000	100 000
madelon	175	831	226	500	2 000	600
rcv1	7 632	10 278	333 594	47 236	20 242	677 399
w7a	106	8 659	7 963	300	24 692	25 057
segment	203	17 490	9 687	429	43 500	14 500

下载: 导出CSV

表 2 使用WCRF方法行为正确分类的精度

子块大小	文献[24]中的数据集		文献[ 22]中的数据集
子块大小	w=0%	w=1%	w=0%	w=1%
8×8	73.53	77.64	71.28	78.31
4×4	87.72	92.37	97.76	93.25
1×1	94.29	97.85	99.42	99.89

下载: 导出CSV

表 3 使用不同方法的行为分类

方法	识别正确率/%
模板匹配	81.86
HMM模型	89.23
CRFs模型 (w=0)	91.75
CRFs模型 (w=1)	95.08
WCRF模型 (w=0)	99.84
WCRF模型 (w=1)	99.97

下载: 导出CSV

表 4 其他因素影响下的鲁棒性评价

测试序列	变化条件	实验结果	识别是否正确
对角走	尺度和视点	跳	不正确
原地跳步	非刚性变形	跑	不正确
摆动着包步行	刚性变形	跳	不正确
横向步行	行走风格	侧跳	不正确
跛行	行走风格	走	正确
走路时膝盖抬起	行走风格	走	正确
穿着裙子走	服装	走	正确
走路时腿部分遮挡	部分遮挡	走	正确
步行/携带公文包	携带物体	走	正确
正常走	背景	走	正确

下载: 导出CSV

[1]	MARYAM Z, ROBERT B. Semantic human activity recognition:a literature review[J]. Pattern Recognition, 2015, 48(8):2329-2345. doi: 10.1016/j.patcog.2015.03.006
[2]	MATTHEW F, DAVID S, PAN Z X, et al. Recognizing human motions through mixture modeling of inertial data[J]. Pattern Recognition, 2015, 48(8):2394-2406. doi: 10.1016/j.patcog.2015.03.004
[3]	POPOOLA O P, WANG K J. Video-based abnormal human behavior recognition-a review[J]. IEEE Transactions on Systems, Man and Cybernetics Part C:Applications and Reviews, 2012, 42(6):1-14. http://www.oalib.com/references/9307043
[4]	NIEBLES J C, WANG H C, LI F F. Unsupervised learning of human action categories using spatial-temporal words[J]. International Journal of Computer Vision, 2008, 79(3):299-318. doi: 10.1007/s11263-007-0122-4
[5]	GORELICK L, BLANK M, SHECHTMAN E, et al. Action as space-time shapes[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(12):2247-2253. doi: 10.1109/TPAMI.2007.70711
[6]	BOBICK A, DAVIS J. The recognition of human movement using temporal templates[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(3):257-267. doi: 10.1109/34.910878
[7]	WANG S, ARIADNA Q, MORENCY L P, et al. Hidden conditional random fields for gesture recognition[C]//CVPR. New York:IEEE, 2006, 2:1521-1527.
[8]	PEHLIVAN S, DUYGULU P. A new pose-based representation for recognizing actions from multiple cameras[J]. Computer Vision & Image Understanding, 2011, 115(2):140-151.
[9]	WEINLAND D, RONFARD R, BOYER E. A survey of vision-based methods for action representation, segmentation and recognition[J]. Computer Vision & Image Understanding, 2011, 115(2):224-241. https://www.researchgate.net/publication/256979980_A_Survey_of_Vision-Based_Methods_for_Action_Representation_Segmentation_and_Recognition
[10]	MORRIS B T, TRIVEDI M M. Trajectory learning for activity understanding:Unsupervised, multilevel, and long-term adaptive approach[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2011, 33(11):2287-2301.
[11]	HOLZER S, ILIC S, NAVAB N. Multi-layer adaptive linear predictors for real-time tracking[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 35(1):105-117. https://www.researchgate.net/publication/223969594_Multi-Layer_Adaptive_Linear_Predictors_for_Real-Time_Tracking
[12]	SCHULDT C, LAPTEV I, CAPUTO B. Recognizing human actions:a local SVM approach[C]//17th International Conference on Pattern Recognition. Cambridge:IEEE, 2004:3:32-36.
[13]	SELEN P, DAVID A F. Recognizing activities in multiple views with fusion of frame judgments[J]. Image and Vision Computing, 2014, 32(4):237-249. doi: 10.1016/j.imavis.2014.01.006
[14]	SHAO Z P, LI Y F. Integral invariants for space motion trajectory matching and recognition[J]. Pattern Recognition, 2015, 48(8):2418-2432. doi: 10.1016/j.patcog.2015.02.029
[15]	KEREM A, KARON E M. Recognizing affect in human touch of a robot[J]. Pattern Recognition Letters, 2015, 66(15):31-40.
[16]	STIKIC M, LARLUS D, EBERT S, et al. Weakly supervised recognition of daily life activities with wearable sensors[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2011, 33(12):2521-2537. http://www.academia.edu/16023877/Weakly_Supervised_Recognition_of_Daily_Life_Activities_with_Wearable_Sensors
[17]	WANG Zu-chao, LU Min, YUAN Xiao-ru, et al. Visual traffic jam analysis based on trajectory data[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12):2159-2168. doi: 10.1109/TVCG.2013.228
[18]	WANG Heng, KLÄSER A, SCHMID C, et al. Dense trajectories and motion boundary descriptors for action recognition[J]. International Journal of Computer Vision, 2013, 103(1):60-79. doi: 10.1007/s11263-012-0594-8
[19]	BASHIR F I, KHOKHAR A A, DAN S. View-invariant motion trajectory-based activity classification and recognition[J]. Multimedia Systems, 2006, 12(1):45-54. doi: 10.1007/s00530-006-0024-2
[20]	CHO S Y, SOOY K, HYE B. Recognizing human-human interaction activities using visual and textual information[J]. Pattern Recognition Letters, 2013, 34(15):1840-1848. doi: 10.1016/j.patrec.2012.10.022
[21]	LIU Hao-wei, MATTHAI P, MARTIN P, et al. Recognizing object manipulation activities using depth and visual cues[J]. Journal of Visual Communication and Image Representation, 2014, 25(4):719-726. doi: 10.1016/j.jvcir.2013.03.015
[22]	WANG Liang, SUTER D. Recognizing human activities from silhouettes:Motion subspace and factorial discriminative graphical model[C]//CVPR. Minneapolis:IEEE, 2007:1-8.
[23]	VERES G V, GORDON L, CARTER J N, et al. What image information is important in silhouette-based gait recognition?[C]//CVPR. Washington:IEEE, 2004, 2:776-782.
[24]	REDDY K, SHAH M. Recognizing 50 human action categories of web videos[J]. Machine Vision and Applications, 2013, 24(5):971-981. doi: 10.1007/s00138-012-0450-4
[25]	王晓, 刘小芳.基于NSVM的核空间训练数据减少方法[J].电子科技大学学报, 2013, 42(4):592-596. http://www.xb.uestc.edu.cn/nature/index.php?p=item&item_id=1330 WANG Xiao, LIU Xiao-fang. Nonlinear support vector machine for training data reduction in kernel space[J]. Journal of University of Electronic Science and Technology of China, 2013, 42(4):592-596. http://www.xb.uestc.edu.cn/nature/index.php?p=item&item_id=1330
[26]	WU Jian-ning, WANG Jue, LIU Li. Feature extraction via KPCA for classification of gait patterns[J]. Human Movement Science, 2007, 26(3):393-411. doi: 10.1016/j.humov.2007.01.015
[27]	SCHOLKOPF B, SMOLA A, MULLER K. Nonlinear component analysis as a kernel eigenvalue problem[J]. Neural Computation, 1998, 10(5):1299-1319. doi: 10.1162/089976698300017467
[28]	VEERARAGHAVAN A, CHELLAPPA R, ROY A K. The function space of an activity[C]//CVPR. New York:IEEE, 2006, 1:959-966.
[29]	李旭, 何明一, 张雷. WorldView-2遥感图像融合新方法[J].电子科技大学学报, 2015, 44(1):28-32. http://www.xb.uestc.edu.cn/nature/index.php?p=item&item_id=1624 LI Xu, HE Ming-yi, ZHANG Lei. New pansharpening method for WorldView-2 satellite images[J]. Journal of University of Electronic Science and Technology of China, 2015, 44(1):28-32. http://www.xb.uestc.edu.cn/nature/index.php?p=item&item_id=1624

[1]	胡青松, 张亮, 丁娟, 李世银. 人体动作数据编码与CNN精确识别 . 电子科技大学学报, 2020, 49(3): 473-480. doi: 10.12178/1001-0548.2019108
[2]	程石磊, 赵雷, 钮孟洋, 廖炳焱, 解梅, 顾菘, 张跃飞. 判别分析字典在行为识别中的算法研究 . 电子科技大学学报, 2019, 48(5): 767-773. doi: 10.3969/j.issn.1001-0548.2019.05.017
[3]	王军, 夏利民. 基于因果分析的群体行为识别 . 电子科技大学学报, 2018, 47(2): 256-261. doi: 10.3969/j.issn.1001-0548.2018.02.015
[4]	符丁, 李明江, 黎路. 基于价值驱动的人类行为动力学实证研究和建模 . 电子科技大学学报, 2015, 44(5): 652-656. doi: 10.3969/j.issn.1001-0548.2015.05.002
[5]	李广明, 李秀平, 王善进. 周期场中片状束输运的非线性行为 . 电子科技大学学报, 2015, 44(4): 539-543. doi: 10.3969/j.issn.1001-0548.2015.04.012
[6]	周涛, 韩筱璞, 闫小勇, 杨紫陌, 赵志丹, 汪秉宏. 人类行为时空特性的统计力学 . 电子科技大学学报, 2013, 42(4): 481-540. doi: 10.3969/j.issn.1001-0548.2013.04.001
[7]	张和发, 李立萍. 含噪独立分量分析的期望最大化算法 . 电子科技大学学报, 2012, 41(4): 527-531. doi: 10.3969/j.issn.1001-0548.2012.04.009
[8]	侯海平, 曲长文, 向迎春, 苏峰. LFMCW SAR调频非线性的等效分析方法 . 电子科技大学学报, 2012, 41(2): 198-202,237. doi: 10.3969/j.issn.1001-0548.2012.02.005
[9]	闫小勇. 人类个体出行行为的统计实证 . 电子科技大学学报, 2011, 40(2): 168-173. doi: 10.3969/j.issn.1001-0548.2011.02.002
[10]	杨珺, 曹阳, 马秦生, 王敏. 人工免疫行为轮廓取证分析方法 . 电子科技大学学报, 2010, 39(6): 911-914,919. doi: 10.3969/j.issn.1001-0548.2010.06.022
[11]	于雪莲, 刘本永. 最优的核判别分析用于雷达目标识别 . 电子科技大学学报, 2008, 37(6): 883-885,937.
[12]	胡丹, 肖建, 车畅. 再生核支持向量机在非线性系统中的应用 . 电子科技大学学报, 2008, 37(1): 124-127.
[13]	孙艳争, 黄炜, 余波. 基于EMD的非线性信号自适应分析 . 电子科技大学学报, 2007, 36(1): 24-26.
[14]	李斌, 杨中海, 朱小芳, 廖莉, 肖礼, 曾葆青. 改进的行波管三维非线性理论 . 电子科技大学学报, 2005, 34(5): 630-633.
[15]	张智林, 皮亦鸣, 孙志坚. 基于独立分量分析的降噪技术 . 电子科技大学学报, 2005, 34(3): 296-299.
[16]	陈怀新, 南建设, 肖先赐. 基于统计特征主分量的信号调制识别 . 电子科技大学学报, 2004, 33(3): 231-234,238.
[17]	史宗君, 杨梓强, 梁正. 同轴波导FEL放大器的非线性分析 . 电子科技大学学报, 2002, 31(3): 245-249.
[18]	高飞, 郭彦平, 赵东风. 随机多址信道冲突分解算法及吞吐量分析 . 电子科技大学学报, 2001, 30(2): 124-128,133.
[19]	滕召波, 张世永, 陈华富, 何光中. 非线性规划一般约束条件的SQP方法 . 电子科技大学学报, 2001, 30(1): 103-106.
[20]	唐小我. 非线性需求函数条件下二度价格歧视研究 . 电子科技大学学报, 1999, 28(1): 78-83.

点击查看大图

图(7) / 表(4)

计量

文章访问数: 4707
HTML全文浏览量: 1449
PDF下载量: 120
被引次数: 0

全文HTML

人类行为识别有着广泛的应用前景，如视频监控和监测、对象视频摘要、智能接口、人机交互、体育视频分析、视频检索等，吸引了越来越多计算机视觉研究者的关注^[1-3]。通常，行为识别涉及两个重要问题：1) 如何从原始视频数据中提取有用的运动信息；2) 如何建立运动参考模型，使训练和识别方法能有效地处理空间和时间尺度变化的类内类似行为。

行为识别可以利用各种线索，如关键姿势^[4-8]、光流^[9-10]、局部描述符^[11-13]、运动轨迹或特征跟踪^[14-19]、视觉文本信息^[20-21]、人体轮廓^[22-24]等，但是使用关键帧缺乏运动信息。根据光流或兴趣点的行为识别在平滑的表面、运动奇异性和低质量的视频情况下是不可靠的。由于人体外表和关节出现大的变化，特征跟踪也不容易实现。

人类行为是一种时空行为，时空模型 (如HMMs及其变种) 已被广泛用于人体动作建模^{[7, 12]}。然而，该生成模型通常使用强烈的独立性假设，这使得它很难适应多种复杂的特征或观测中的远距离依存关系。文献[4-5, 8]提出的条件随机场模型 (CRFs) 避免了观察之间的独立性假设，同时将复杂的特征和远距离依存关系融合进模型中。本文在此基础上提出了具有联合判别学习能力的基于动作子空间与权重条件随机的行为识别方法，使用KPCA来发现关键动作空间的内在结构^[25-28]，并利用权重化条件随机从简单的人体轮廓观察中识别人类行为。实验结果证明了该方法的有效性和鲁棒性。

4. 结束语

本文介绍了基于动作子空间与权重条件随机场的行为识别的有效概率框架。该方法的创新之处在于两方面：

1) 特征提取和表示方面，本文选择简单而易于提取的时空人体轮廓作为输入，并将它们嵌入到一个低维的内核空间。

2) 行为建模和识别方面，本文提出在视觉领域使用WCRF，与HMMs和一般CRFs比较表现出优势。本文提出的框架不依赖于使用的特征，可以很容易地扩展到其他类型的视频行为分析。本文方法使用的特征维数及训练、测试集大小都明显地变小，从而可以减少计算机的处理工作量，满足实时行为识别的要求。

本文的研究工作得到了柳州市科学研究与技术开发计划 (2016C050205)、广西信息科学实验中心开放基金 (KF1403)、广西科技大学博士基金 (院科博12Z14)、广西科技大学创新团队“图像处理与智能认知及应用”的资助，在此表示感谢！

参考文献 (29)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于动作子空间和权重条件随机场的行为识别王智文

doi: 10.3969/j.issn.1001-0548.2017.02.016

作者简介:
王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

Behavior Recognition Based on Action Subspace and Weight Condition Random Field

计量

基于动作子空间和权重条件随机场的行为识别王智文

doi: 10.3969/j.issn.1001-0548.2017.02.016

作者简介:
王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

English Abstract

Behavior Recognition Based on Action Subspace and Weight Condition Random Field

全文HTML

1.1. 人体轮廓提取和表示

1.2. 非线性降维

2.1. 普通的CRFs

2.2. WCRF

2.3. 训练与推理

目录

期刊在线

编辑办公

友情链接

留言板

基于动作子空间和权重条件随机场的行为识别王智文

doi: 10.3969/j.issn.1001-0548.2017.02.016

作者简介: 王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

Behavior Recognition Based on Action Subspace and Weight Condition Random Field

计量

出版历程

基于动作子空间和权重条件随机场的行为识别王智文

doi: 10.3969/j.issn.1001-0548.2017.02.016

作者简介: 王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

English Abstract

Behavior Recognition Based on Action Subspace and Weight Condition Random Field

全文HTML

1.1. 人体轮廓提取和表示

1.2. 非线性降维

2.1. 普通的CRFs

2.2. WCRF

2.3. 训练与推理

目录

期刊在线

编辑办公

友情链接

作者简介:
王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究

作者简介:
王智文 (1969-), 男, 博士, 教授, 主要从事机器学习与计算机视觉、移动目标检测与识别方面的研究