Abstract
Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only scene views, removing and inpainting dynamic objects simultaneously. Our approach combines the principles of local radiance fields with the bidirectional optimization of omnidirectional rays. Our input is an omnidirectional video, and we evaluate the mutual observations of the entire angle between the previous and current frames. To reduce ghosting artifacts of dynamic objects and inpaint occlusions, we devise a multi-resolution motion mask prediction module. Unlike existing methods that primarily separate dynamic components through the temporal domain, our method uses multi-resolution neural feature planes for precise segmentation, which is more suitable for long 360-degree videos. Our experiments validate that OmniLocalRF outperforms existing methods in both qualitative and quantitative metrics, especially in scenarios with complex real-world scenes. In particular, our approach eliminates the need for manual interaction, such as drawing motion masks by hand and additional pose estimation, making it a highly effective and efficient solution.
Abstract (translated)
全景相机在各种应用中广泛使用,以提供广阔的视野。然而,由于不可避免地存在于其广角视野中的动态物体(包括摄影师)的存在,它们在合成新颖视角时面临挑战。在本文中,我们介绍了一种名为OmniLocalRF的新方法,可以将静态仅的场景视图同时消除和修复动态物体。我们的方法将局部辐射场原理与指向性光线双向优化相结合。我们的输入是一个全景视频,我们评估前后帧之间整个角度的相互观察。为了减少动态物体的幽灵像和修复修复遮挡,我们设计了一个多分辨率运动掩码预测模块。与现有的方法不同,我们使用多分辨率神经特征平面进行精确分割,这更适合于长360度视频。我们的实验证实,OmniLocalRF在质量和数量上优于现有方法,特别是在复杂现实场景中。特别是,我们的方法无需手动交互,例如通过手绘运动掩码和额外的姿态估计,这使得它成为一种高效且有效的解决方案。
URL
https://arxiv.org/abs/2404.00676