Paper Reading AI Learner

WinDB: HMD-free and Distortion-free Panoptic Video Fixation Learning

2023-05-23 10:25:22
Guotao Wang, Chenglizhao Chen, Aimin Hao, Hong Qin, Deng-ping Fan

Abstract

To date, the widely-adopted way to perform fixation collection in panoptic video is based on a head-mounted display (HMD), where participants' fixations are collected while wearing an HMD to explore the given panoptic scene freely. However, this widely-used data collection method is insufficient for training deep models to accurately predict which regions in a given panoptic are most important when it contains intermittent salient events. The main reason is that there always exist "blind zooms" when using HMD to collect fixations since the participants cannot keep spinning their heads to explore the entire panoptic scene all the time. Consequently, the collected fixations tend to be trapped in some local views, leaving the remaining areas to be the "blind zooms". Therefore, fixation data collected using HMD-based methods that accumulate local views cannot accurately represent the overall global importance of complex panoramic scenes. This paper introduces the auxiliary Window with a Dynamic Blurring (WinDB) fixation collection approach for panoptic video, which doesn't need HMD and is blind-zoom-free. Thus, the collected fixations can well reflect the regional-wise importance degree. Using our WinDB approach, we have released a new PanopticVideo-300 dataset, containing 300 panoptic clips covering over 225 categories. Besides, we have presented a simple baseline design to take full advantage of PanopticVideo-300 to handle the blind-zoom-free attribute-induced fixation shifting problem. Our WinDB approach, PanopticVideo-300, and tailored fixation prediction model are all publicly available at this https URL.

Abstract (translated)

到目前为止,普遍采用的方法在潘托普视频中进行定位收集是基于头戴显示器(HMD),在该HMD上收集参与者的定位数据以自由探索给定潘托普场景。然而,这种广泛使用的数据收集方法不足以训练深度模型,准确预测当给定潘托普场景包含间歇性突出事件时,哪些区域是最重要的。主要原因是因为使用HMD收集定位数据时,存在“盲目放大”现象,因为参与者不能一直转动头部探索整个潘托普场景。因此,收集的定位数据往往被困在某些局部视图中,留下剩余的区域成为“盲目放大”。因此,使用HMD为基础的方法和收集局部视图的总和并不能准确反映复杂全景场景的整体全球重要性。本文介绍了一种动态模糊(WinDB)定位收集方法,用于潘托普视频,该方法不需要HMD且无盲目放大现象。因此,收集的定位数据可以很好地反映地区的重要性程度。使用我们的WinDB方法,我们发布了一个潘托普视频-300数据集,包含300个潘托普片段,涵盖了225个类别。此外,我们还介绍了一个简单的基准设计,以充分利用潘托普视频-300,以处理无盲目放大特征导致的定位转移问题。我们的WinDB方法、潘托普视频-300和定制的定位预测模型都在这个httpsURL上公开可用。

URL

https://arxiv.org/abs/2305.13901

PDF

https://arxiv.org/pdf/2305.13901.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot