视频人脸识别中高效分解卷积与时间金字塔网络研究

Efficient Decomposition Convolution and Temporal Pyramid Network for Video Face Recognition

摘要: 随着大量视频监控和摄像头网络的架设，非受限场景下的连续视频帧人脸识别愈发引人关注。传统的连续视频帧人脸识别方法大多存在识别结果易波动和计算资源消耗密集的问题。因此，该文对比了不同的帧间汇聚方式，采用注意力机制优化帧间汇聚过程，并采用3D分离卷积进行视频人脸建模，有效降低了视频人脸识别的计算消耗，提高了识别准确率。此外，提出了一种时间金字塔网络，可以进一步有效挖掘帧间互补信息，以提高识别准确率。该方法的有效性在YTF、PaSC数据集上得到了验证。

Abstract: With a large number of video surveillance and camera networks, face recognition of continuous video frames in unrestricted scenes is becoming more and more attractive. Most of the traditional face recognition methods for continuous video frames have the problem of fluctuating recognition results and intensive computing resources. In this paper, an efficient 3D decomposition convolution is designed, which can effectively reduce the computational consumption of video face recognition and improve the recognition accuracy. Finally, we also propose a temporal pyramid network to further effectively mine complementary information between frames to improve the recognition accuracy. The performance has been tested on YTF and PaSC datasets.