Abstract
In this paper, we present SPVLoc, a global indoor localization method that accurately determines the six-dimensional (6D) camera pose of a query image and requires minimal scene-specific prior knowledge and no scene-specific training. Our approach employs a novel matching procedure to localize the perspective camera's viewport, given as an RGB image, within a set of panoramic semantic layout representations of the indoor environment. The panoramas are rendered from an untextured 3D reference model, which only comprises approximate structural information about room shapes, along with door and window annotations. We demonstrate that a straightforward convolutional network structure can successfully achieve image-to-panorama and ultimately image-to-model matching. Through a viewport classification score, we rank reference panoramas and select the best match for the query image. Then, a 6D relative pose is estimated between the chosen panorama and query image. Our experiments demonstrate that this approach not only efficiently bridges the domain gap but also generalizes well to previously unseen scenes that are not part of the training data. Moreover, it achieves superior localization accuracy compared to the state of the art methods and also estimates more degrees of freedom of the camera pose. We will make our source code publicly available at this https URL .
Abstract (translated)
在本文中,我们提出了SPVLoc,一种全局室内定位方法,准确地确定了一个查询图像的六维(6D)相机姿态,并且不需要场景特定知识,也不需要场景特定训练。我们的方法采用了一种新颖的匹配过程,用于在室内环境的一个全景语义布局表示中定位视角相机的视图域,该表示为一个 RGB 图像。全景图是从无纹理的 3D 参考模型中渲染的,该模型仅包含房间形状的大致结构信息以及门和窗户注释。我们证明了直通道卷积网络结构可以成功实现图像到全景和最终图像到模型的匹配。通过视图分类得分,我们排名参考全景并将最佳匹配分配给查询图像。然后,在选择的全景图像和查询图像之间估计 6D 相对姿态。我们的实验证明,这种方法不仅有效地弥合了领域差距,而且对之前未见过的场景具有很好的泛化能力。此外,与最先进的方法相比,它的定位精度更高,同时还估计了相机的姿态自由度。我们将源代码公开发布在以下链接处:https:// 这个 URL 。
URL
https://arxiv.org/abs/2404.10527