Abstract
We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of a visual relocalizer, that is, of a method that registers new views to the current state of the reconstruction. This perspective allows us to investigate alternative visual relocalizers that are not rooted in local feature matching. We show that scene coordinate regression, a learning-based relocalization approach, allows us to build implicit, neural scene representations from unposed images. Different from other learning-based reconstruction methods, we do not require pose priors nor sequential inputs, and we optimize efficiently over thousands of images. Our method, ACE0 (ACE Zero), estimates camera poses to an accuracy comparable to feature-based SfM, as demonstrated by novel view synthesis. Project page: this https URL
Abstract (translated)
我们从一系列图像描述的场景中估计相机的参数。流行的基于特征的结构从运动(SfM)工具通过迭代重构稀疏的3D点并进行相机视图的注册来解决这个任务。我们将递归结构从运动重新解释为迭代应用和优化,即从当前重建状态的视觉重定位器中注册新视图。这种视角使我们能够研究不基于局部特征匹配的 alternative visual relocalizers。我们证明了场景坐标回归,一种基于学习的重定位方法,可以从未姿态的图像中构建隐含的神经场景表示。与其它学习 based 的重构方法不同,我们不需要姿态先验或者顺序输入,而且我们在成千上万的图像上优化效率。我们的方法 ACE0(ACE Zero)估计相机的参数,其精度与基于特征的 SfM 相当,如图所示,通过新颖的视图合成证明了这一点。页面链接:这个 <https:// this URL>
URL
https://arxiv.org/abs/2404.14351