A Crowd Counting Method Based on Convolutional Neural Networks and Density Distribution Features

GUO Ji-chang; LI Xiang-peng

doi:10.3969/j.issn.1001-0548.2018.06.002

Crowd counting is difficult to get accurate statistics due to shading, shadows and changes in crowd density. This paper presents an approach to combine the convolutional neural networks and density features map legitimately. We segment the crowd scene into many blobs according to the density. For low-density blobs, Retinex algorithm is used to denoise the scene and then the scene is transformed into HSV color space to locate the pedestrian. Convolutional neural networks are used to extract the pedestrian features with grid loss function to avoid the occlusion issue. For high density blobs, crowd density distribution features are extracted to train the multiple kernel regression models to estimate the numbers. Experiments are conducted on datasets PETS2009, UCSD. The experimental results show that the proposed method improves the accuracy to some extent in comparison with other algorithms.

HTML

近年来，随着计算机技术的不断发展，智能视频监控系统己在商场、学校、火车站等公共场所大量应用，以保障社会有序稳定运行。人数统计是智能视频监控领域的一个具有现实意义的研究方向，也是计算机视觉领域的研究热点和难点之一。准确统计监控场景中的人数在公安防控、商业信息采集以及配置社会资源和设施上具有重要意义。

行人统计使用的方法主要有基于目标检测的人数统计方法和基于特征回归的人数统计方法，这两类方法均用到了有监督的机器学习思想，此外还有基于无监督学习的跟踪轨迹聚类^[1]方法等。在有监督学习方法中，基于HOG^[2]算法检测行人是被广泛使用的方法之一，该方法通过计算和统计图像局部区域的梯度方向直方图来构成行人特征，此外还有提取行人头部、面部特征和模板匹配的检测方法，如LBP算法^[3]，DPM算法^[4]，再使用机器学习中SVM分类器以及Adaboost级联分类器^[3]训练出行人分类器进而识别和检测行人。此类方法在行人遮挡严重以及光照不足时，精度较差；基于特征回归的方法则通过提取区域的纹理等特征，然后采用核函数实现纹理特征到人数的回归映射。此类方法可以有效的降低行人互相遮挡对检测的影响，但是人群分布特征很难使用数学特征完全描述，影响统计准确性。

近几年，随着深度学习理论的日趋成熟以及硬件设备性能的提升，使得卷积神经网络成为计算机视觉与模式识别领域的一个有力工具。文献[5]将卷积神经网络结构优化后应用于目标分类，在ImageNet图像数据库上的测试中取得了令人满意的结果。文献[6]提出了R-CNN(regionproposal-CNN)算法并创建了在GPU上运行的Caffe框架，成为近年深度学习中实现目标检测的经典算法之一。文献[7]采用卷积神经网络将人群的分布特征提取后投入训练，用网络生成的模型估计视频中行人数量，成为深度学习应用于行人检测的一个实例。这些算法虽然提取到了有效的行人特征并建立了精确的预测模型，但仍然不能很好地解决遮挡和光照变化以及人群分布不均等因素对检测的影响。

为了解决上述问题，在提取监控视频中行人特征之前需要考虑到实际场景中的情况，如人群分布聚集不规律、行人间存在相互遮挡、光照度偏暗以及雨雾天气等复杂的室外环境因素的影响。本文针对以上问题对现有算法做出选择和改进，提出结合卷积神经网络和密度分布特征的人数统计算法，将不同的特征提取算法应用在其对应的场景中，从而有效解决了上述干扰因素，增加了算法统计精度。

5. 结束语

在行人监控视频中，由于行人遮挡、场景光照变化、人群分布不均等因素的影响使得现有算法难以准确统计视频中人数。针对以上问题，本文将场景中分布不均的行人根据密度划分并提出基于卷积神经网络识别和密度特征回归相结合的人数统计算法。为了避免光照变化和雨雾天气对算法造成干扰，本文将场景去噪增强处理后，转换到HSV颜色空间中预判行人位置并提取特征。提出了栅极损失函数分块训练卷积神经网络的算法，实现了对遮挡行人局部位置的识别。提出了融合行人密度分布特征的回归算法以增加统计的精度。

实验证明在相同场景下，本文所提方法优于其他同类方法，但本文未完全考虑距离远近对团块大小的影响以及非行人目标出现时的情况，在未来的工作中将寻求更优的方法来提升算法性能。

Reference (23)

[1]	ANTONINI G, THIRAN J P. Counting pedestrians in video sequences using trajectory clustering[J]. IEEE Transactions on Circuits & Systems for Video Technology, 2006, 16(8): 1008-1020.
[2]	DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//IEEE Conference on Computer Vision & Pattern Recognition. [S.l.]: IEEE, 2005.
[3]	YANG S, LIAO X, BORASY U K. A pedestrian detection method based on the HOG-LBP feature and gentle AdaBoost[J]. International Journal of Advancements in Computing Technology, 2012, 4(19): 553-560. doi: 10.4156/ijact
[4]	FORSYTH D. Object detection with discriminatively trained part-based models[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2010, 32(9): 1627-45.
[5]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. Doha, Qatar: Curran Associates Inc, 2012: 1097-1105.
[6]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Ohio, USA: IEEE, 580-587.
[7]	ZHANG C, LI H, WANG X, et al. Cross-scene crowd counting via deep convolutional neural networks[C]// Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2015: 833-841.
[8]	RAHMAN Z U, JOBSON D J, WOODELL G A. Retinex processing for automatic image enhancement[J]. Human Vision and Electronic Imaging VII, 2002, 13(1): 100-110.
[9]	OPITZ M, WALTNER G, POIER G, et al. Grid loss: detecting occluded faces[M]//Computer Vision – ECCV. [S.l.]: Springer International Publishing, 2016.
[10]	UIJLINGS J R R, SANDE K E A V D, GEVERS T. Selective search for object recognition[J]. International Journal of Computer Vision, 2013, 104(2): 154-171.
[11]	GEUSEBROEK J M, VAN D B R, SMEULDERS A W M. Color invariance[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2001, 23(12): 1338-1350.
[12]	ANDREW L M, AWNI Y H, ANDREW Y N. Rectifier nonlinearities improve neural network acoustic models[C]// Proceedings of the 30th International Conference on Machine Learning. Atlanta, Georgia, USA: IMLC, 2013.
[13]	SRIVASTAVA N, HINTON G, KRIZHEVSKY A. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
[14]	VILLAMIZAR M, GRABNER H, MORENO-NOGUER F, et al. Efficient 3D object detection using multiple pose-specific classifiers[C]//Proceedings of the British Machine Vision Conference. Dundee, Scotland, UK: BMVA Press, 2011.
[15]	CHAN A B, VASCONCELOS N. Counting people with low-level features and Bayesian regression[J]. IEEE Transactions on Image Processing, 2012, 21(4): 2160-2177. doi: 10.1109/TIP.2011.2172800
[16]	薛陈.复杂场景下的人数统计系统[D].天津: 天津大学, 2012.	XUE Chen. People counting system in complex scenario[D]. Tianjin: Tianjin University, 2012.
[17]	LEMPITSKY V S, ZISSERMAN A. Learning to count objects in images[C]//Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc, 2010: 1324-1332.
[18]	LEHMUSSOLA A, RUUSUVUORI P, SELINUMMI J. Computational framework for simulating fluorescence microscope images with cell populations[J]. Medical Imaging IEEE Transactions on, 2007, 26(7): 1010-1016. doi: 10.1109/TMI.2007.896925
[19]	KLOFT M, BREFELD U, SONNENBURG S. lp-Norm multiple Kernel learning[J]. Journal of Machine Learning Research(S1533-7928), 2011, 12(3): 953-997.
[20]	CHEN K, GONG S, XIANG T, et al. Cumulative attribute space for age and crowd density estimation[C]//Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2013: 2467-2474.
[21]	SUBBURAMAN V B, DESCAMPS A, CARINCOTTE C. Counting people in the crowd using a generic head detector[C]//IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance. [S.l.]: IEEE, 2012: 470-475.
[22]	CONTE D, FOGGIA P, PERCANNELLA G. A method for counting moving people in video surveillance videos[J]. EURASIP Journal on Advances in Signal Processing, 2010, (): -. doi: 10.1155/2010/231240
[23]	RAO A S, GUBBI J, MARUSIC S. Estimation of crowd density by clustering motion cues[J]. The Visual Computer, 2015, 31(11): 1533-1552. doi: 10.1007/s00371-014-1032-4

颜色空间类型	RGB	I	Lab	HSV	rgb	C	H
光强	-	-	+/-	2/3	+	+	+
阴影/遮挡	-	-	+/-	2/3	+	+	+
高光	-	-	-	1/3	-	+/-	+
(注：“+/-”表示部分颜色分量不变，1/3表示某一通道颜色分量不变)

算法	MAE	MSE
Cumulative Attribute Regression^[20]	2.07	0.08
clustering motion cues^[23]	2.97	0.12
文献[7]	1.60	0.10
本文算法	1.40	0.07

算法	MAE	MSE
文献[22]	4.00	0.24
clustering motion cues^[23]	2.78	0.21
Head detection^[21]	2.40	0.12
本文算法	2.10	0.08

A Crowd Counting Method Based on Convolutional Neural Networks and Density Distribution Features

doi: 10.3969/j.issn.1001-0548.2018.06.002

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views