Abnormal Event Detection Based on Multi-Scale Features Prediction

WANG Jun

doi:10.12178/1001-0548.2021333

A novel method for abnormal event detection is proposed based on multi-scale feature prediction. Firstly, dilated convolution network is used to extract the features of different size receptive fields and fuse them so that address the objects of different scale in video frame. Secondly, a lightweight channel-wise attention module is applied to reduce the impact of background information. Finally, in order to make full use of the context information between video frames, a deep feature prediction module is applied to predict the features of the current moment based on the features of the historical moment, and the prediction error is used for abnormality judgment. Experiments were performed on the two benchmark data sets of USCD Ped2 and UMN to test and evaluate the proposed method. The experiments results show that the proposed method is more effective than other state-of-the-art methods.

HTML

随着公共安全体系建设的不断发展，监控摄像头被广泛应用在各种公共场合中，如商场、街道、银行等。由于监控视频内容庞大，人工进行异常事件检测会耗费大量的人力物力^[1-4]。因此，如何建立一个高效的自动异常事件检测系统非常重要，这也是计算机视觉研究的一个重要方向。

异常事件检测大体可分为基于手工特征的方法和基于深度学习的方法，近年来基于深度学习的方法被广泛研究^[1,5-10]。由于深度神经网络卓越的生成能力，基于重建和预测的异常事件检测方法被广泛地使用。文献[1]开创性地将U-net网络引入异常事件检测领域中，根据历史时刻的视频帧预测未来帧，并根据预测误差进行异常检测。文献[5]对U-Net网络进行改进，将其变化为一个双流网络，网络的两个流分别对视频帧进行重建和预测，并引入生成对抗的思想进行训练，以生成更加逼真的图像，最后根据重建误差进行异常判断。考虑到视频是由一系列关联性很强的图像组成，不少学者提出时间信息的概念，并将其用于视频异常事件检测中。文献[7]利用3D卷积提取输入视频片段中的空间特征和时间信息特征，并使用两个3D反卷积分别进行重建和预测。循环神经网络(recurrent neural network, RNN)及其变体由于其优秀的时间信息编码能力被用于异常事件检测中。文献[8]将LSTM网络与软硬注意力相结合提出行人轨迹预测网络，该网络不仅关注行人的历史轨迹，同时还关注该行人的邻域对其轨迹的影响。文献[9]将卷积自编码器与ConvLSTM相结合，利用卷积自编码器获取空间特征的变化，利用ConvLSTM记录特征随时间的变化，并将光流作为补充信息，从全局−局部的角度分析异常。此外，由于监控视频的视角大多是固定的，视频中可能会出现不同大小的物体，因此多尺度特征被引入到检测模型中。文献[10]提出一种双边多尺度聚合网络，该网络利用不同膨胀率的空洞卷积提取不同大小感受野的特征，利用ConvLSTM进行双边时间信息编码。

虽然视频异常检测已经取得了一些成就，但依然存在一些问题。如视频中物体大小的变化、复杂背景的影响以及不同场景下异常的定义不同等。为了解决以上问题，本文提出一种充分利用多尺度特征和时间−空间信息的异常事件检测方法。首先，利用经过预训练的VGG16网络提取特征，构建多尺度特征融合模块获取更多不同大小感受野的信息，以获得对输入视频帧的完备表示。其次，使用一种轻量化的通道注意力模块来强调视频中重要的前景信息，以减少背景信息对检测的影响。在此基础上，根据历史时刻特征预测当前时刻的特征，这将有助于弥补前文模块中对上下文信息和时间信息利用不足的缺陷。在训练阶段，最小化预测特征与真实特征之间的欧式距离使整个网络收敛。在测试阶段，本文认为仅包含正常事件的视频帧可以很好地预测，而包含异常事件的视频帧将会产生很大的预测误差。因此，在测试时将根据预测误差进行异常判断。在USCD Ped2和UMN两个基准数据集上进行了实验，实验结果表明了提出方法的有效性。

5. 结束语

本文提出了一种充分利用视频中多尺度信息和时间信息的异常事件检测网络，该网络不仅关注视频中的全局−局部信息，还考虑了空间−时间信息。该网络利用空洞卷积获取多个不同大小的感受野的信息并进行融合以获得整个视频帧的全局−局部表示，并且引入一种轻量化通道注意力机制，通过计算特征图中不同通道所含信息的重要程度，提升重要通道的权重，抑制背景和噪声等干扰因素的影响。最后，为了充分利用时间信息，使用自编码器编码历史时刻的特征序列并预测当前时刻的特征，预测特征与真实特征之间的误差将被用于异常判断。在两个基准数据集上与几种方法进行了对比实验，实验结果证明了本文方法的有效性。

Reference (19)

[1]	LIU W, LUO W, LIAN D, et al. Future frame prediction for anomaly detection-a new baseline[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018: 6536-6545.
[2]	LU C, SHI J, JIA J. Abnormal event detection at 150 fps in matlab[C]//2013 IEEE International Conference on Computer Vision. [S.l.]: IEEE, 2013: 2720-2727.
[3]	SULTANI W, CHEN C, SHAH M. Real-World anomaly detection in surveillance videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018: 6479-6488.
[4]	SONG H, SUN C, WU X, et al. Learning normal patterns via adversarial attention-based autoencoder for abnormal event detection in videos[J]. IEEE Transactions on Multimedia, 2020, 22(8): 2138-2148.
[5]	NGUYEN T N, MEUNIER J. Anomaly detection in video sequence with appearance-motion correspondence[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). [S.l.]: IEEE, 2019: 1273-1283.
[6]	CHANG Y, TU Z, XIE W, et al. Clustering driven deep autoencoder for video anomaly detection[C]//16th European Conference on Computer Vision (ECCV). Glasgow: Springer, 2020: 329-345.
[7]	ZHAO Y, DENG B, SHEN C, et al. Spatio-temporal autoencoder for video anomaly detection[C]//Proceedings of the 25th ACM International Conference on Multimedia. New York: Association for Computing Machinery, 2017: 1933-1941.
[8]	FERNANDO T, DENMAN S, SRIDHARAN S, et al. Soft + hardwired attention: An lstm framework for human trajectory predictionand abnormal event detection[J]. Neural Networks 2018, 108: 466-478.
[9]	YANG B, CAO J, WANG N, et al. Anomalous behaviors detection in moving crowds based on a weighted convolutional autoencoder-long short-term memory network[J]. IEEE Transactions on Cognitive and Devel-Opmental Systems, 2019, 11(4): 473-482.
[10]	LEE S, KIM H G, RO Y M. Bman: Bidirectional multi-scale aggregation networks for abnormal event detection[J]. IEEE Transactions on Image Processing, 2020, 29: 2395-2408.
[11]	ABATI D, PORRELLO A, CALDERARA S, et al. Latent space autoregression for novelty detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). [S.l.]: IEEE, 2019: 481-490.
[12]	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2018, 7132-7141.
[13]	ZHANG Y, LU H C, ZHANG L H, et al. Combining motion and appearance cues for anomaly detection[J]. Pattern Recognition, 2016, 51: 443-452.
[14]	MEHRAN R, OYAMA A, SHAH M. Abnormal crowd behavior detection using social force model[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. [S.l.]: IEEE, 2009: 935-942.
[15]	CONG Y, YUAN J, LIU J. Sparse reconstruction cost for abnormal event detection[C]//2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado: IEEE, 2011: 3449-3456.
[16]	ZHANG Y, LU H C, ZHANG L H, et al. Video anomaly detection based on locality sensitive hashing filters[J]. Pattern Recognition, 2016, 59: 302-311.
[17]	LIU Y S, LI C L, P´oczos Barnaba´s. Classifier two-sample test for video anomaly detections[EB/OL]. [2021-10-11]. http://www.bmva.org/bmvc/2018/contents/papers/0237.pdf.
[18]	WANG Y, ZHANG Q, LI B. Efficient unsupervised abnormal crowd activity detection based on a spatiotemporal saliency detector[C]//2016 IEEE Winter Conference on Applications of Computer Vision (WACV). [S.l.]: IEEE, 2016: 1-9.
[19]	MAHADEVAN V, LI W, BHALODIA V, et al. Anomaly detection in crowded scenes[C]//Computer Vision & Pattern Recognition. [S.l.]: IEEE, 2010: 1975-1981.

Layer	Filter/Stride	Activation function
Conv1	(1×1×512)/1	ReLU
Conv2	(1×1×256)/1	ReLU
Conv3	(1×1×128)/1	−
Conv4	(1×1×256)/1	ReLU

方法	数据集
方法	Ped2	UMN
Without channel-wise attention	0.468	0.395
With SENet	0.493	0.413
With proposed attention module	0.502	0.429

Abnormal Event Detection Based on Multi-Scale Features Prediction

doi: 10.12178/1001-0548.2021333

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views