Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model

Abstract
Abstract (translated)
URL
PDF

Abstract

Omnidirectional video enables spherical stimuli with the $360 \times 180^ \circ$ viewing range. Meanwhile, only the viewport region of omnidirectional video can be seen by the observer through head movement (HM), and an even smaller region within the viewport can be clearly perceived through eye movement (EM). Thus, the subjective quality of omnidirectional video may be correlated with HM and EM of human behavior. To fill in the gap between subjective quality and human behavior, this paper proposes a large-scale visual quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset provides not only the subjective quality scores of sequences but also the HM and EM data of subjects. By mining our dataset, we find that the subjective quality of omnidirectional video is indeed related to HM and EM. Hence, we develop a deep learning model, which embeds HM and EM, for objective VQA on omnidirectional video. Experimental results show that our model significantly improves the state-of-the-art performance of VQA on omnidirectional video.

Abstract (translated)

全向视频可实现360美元以上180 ^ \ circ $观看范围的球形刺激。同时，观察者通过头部运动（HM）仅可以看到全向视频的视口区域，并且可以通过眼睛运动（EM）清楚地感知视口内的甚至更小的区域。因此，全向视频的主观质量可以与人类行为的HM和EM相关联。为了填补主观质量与人类行为之间的差距，本文提出了一种全方位视频的大规模视觉质量评估（VQA）数据集，称为VQA-OV，它收集了60个参考序列和540个受损序列。我们的VQA-OV数据集不仅提供序列的主观质量分数，还提供受试者的HM和EM数据。通过挖掘我们的数据集，我们发现全向视频的主观质量确实与HM和EM有关。因此，我们开发了一种深度学习模型，它嵌入了HM和EM，用于全方位视频的客观VQA。实验结果表明，我们的模型显着提高了VQA在全向视频上的最新性能。

URL

https://arxiv.org/abs/1807.10990

PDF

https://arxiv.org/pdf/1807.10990.pdf