以用户QoE预测值为奖励的视频自适应比特率算法

A Video Adaptive Bitrate Algorithm with User QoE Prediction as Reward

摘要: 该文提出了一种基于深度学习的用户体验质量预测网络(UQPN)，通过当前视频播放状态预测当前用户的QoE并进行建模，旨在采用UQPN替代以往方法的奖励函数，使得生成的自适应比特率算法做出更符合用户需求的比特率决策。实验证明与已有的奖励函数相比，UQPN的预测与真实QoE的相关系数更高，以该网络作为强化学习奖励得到的算法能够将用户体验质量提高20%。

Abstract: This paper proposes a deep learning-based user QoE prediction network (UQPN)). In this work, the current user's QoE is predicted and modeled based on the current video playback states, and UQPN is used to replace the existing reward functions, in this way the generated ABR algorithm can make bitrate decisions more in line with user requirements. Experiments and the comparison with the existing reward functions show that he correlation coefficient of UQPN prediction and user QoE is higher, and the algorithm using UQPN as reinforcement learning reward can improve user QoE by at least 20%.