Paper Reading AI Learner

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

2024-03-31 12:55:05
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

Abstract

Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution.

Abstract (translated)

全景相机在各种应用中广泛使用,以提供广阔的视野。然而,由于不可避免地存在于其广角视野中的动态物体(包括摄影师)的存在,它们在合成新颖视角时面临挑战。在本文中,我们介绍了一种名为OmniLocalRF的新方法,可以将静态仅的场景视图同时消除和修复动态物体。我们的方法将局部辐射场原理与指向性光线双向优化相结合。我们的输入是一个全景视频,我们评估前后帧之间整个角度的相互观察。为了减少动态物体的幽灵像和修复修复遮挡,我们设计了一个多分辨率运动掩码预测模块。与现有的方法不同,我们使用多分辨率神经特征平面进行精确分割,这更适合于长360度视频。我们的实验证实,OmniLocalRF在质量和数量上优于现有方法,特别是在复杂现实场景中。特别是,我们的方法无需手动交互,例如通过手绘运动掩码和额外的姿态估计,这使得它成为一种高效且有效的解决方案。

URL

https://arxiv.org/abs/2404.00676

PDF

https://arxiv.org/pdf/2404.00676.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot