WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem. Our WinDB approach, PanopticVideo-300, and tailored fixation prediction model are all publicly available at this https URL.

Abstract (translated)

到目前为止，普遍采用的方法在潘托普视频中进行定位收集是基于头戴显示器(HMD)，在该HMD上收集参与者的定位数据以自由探索给定潘托普场景。然而，这种广泛使用的数据收集方法不足以训练深度模型，准确预测当给定潘托普场景包含间歇性突出事件时，哪些区域是最重要的。主要原因是因为使用HMD收集定位数据时，存在“盲目放大”现象，因为参与者不能一直转动头部探索整个潘托普场景。因此，收集的定位数据往往被困在某些局部视图中，留下剩余的区域成为“盲目放大”。因此，使用HMD为基础的方法和收集局部视图的总和并不能准确反映复杂全景场景的整体全球重要性。本文介绍了一种动态模糊(WinDB)定位收集方法，用于潘托普视频，该方法不需要HMD且无盲目放大现象。因此，收集的定位数据可以很好地反映地区的重要性程度。使用我们的WinDB方法，我们发布了一个潘托普视频-300数据集，包含300个潘托普片段，涵盖了225个类别。此外，我们还介绍了一个简单的基准设计，以充分利用潘托普视频-300，以处理无盲目放大特征导致的定位转移问题。我们的WinDB方法、潘托普视频-300和定制的定位预测模型都在这个httpsURL上公开可用。

URL

https://arxiv.org/abs/2305.13901

PDF

https://arxiv.org/pdf/2305.13901.pdf