Abstract
We introduce a method that simultaneously learns to explore new large environments and to reconstruct them in 3D from color images only. This is closely related to the Next Best View problem (NBV), where one has to identify where to move the camera next to improve the coverage of an unknown scene. However, most of the current NBV methods rely on depth sensors, need 3D supervision and/or do not scale to large scenes. Our method requires only a color camera and no 3D supervision. It simultaneously learns in a self-supervised fashion to predict a "volume occupancy field" from color images and, from this field, to predict the NBV. Thanks to this approach, our method performs well on new scenes as it is not biased towards any training 3D data. We demonstrate this on a recent dataset made of various 3D scenes and show it performs even better than recent methods requiring a depth sensor, which is not a realistic assumption for outdoor scenes captured with a flying drone.
Abstract (translated)
我们介绍了一种方法,可以同时学习探索新的大型环境和仅从彩色图像中重构它们3D的能力。这与Next Best View Problem(NBV)密切相关,其中必须确定下一步应该移动相机的位置,以改善未知场景的覆盖范围。然而,当前NBV方法的大部分依赖于深度传感器,需要3D监督或无法处理大型场景。我们的方法只需要彩色相机,不需要3D监督。它同时通过自监督的方式学习预测“体积占用空间”从彩色图像中,以及从该空间中预测NBV。得益于这种方法,我们的方法在新场景中表现良好,因为它不倾向于训练3D数据。我们展示了一个由各种3D场景组成的最近数据集,并表明它的表现甚至优于需要深度传感器的最新方法,这对于使用飞行无人机捕获的户外场景来说并不是一种真实的假设。
URL
https://arxiv.org/abs/2303.03315