Abstract
The representation of geometry in real-time 3D perception systems continues to be a critical research issue. Dense maps capture complete surface shape and can be augmented with semantic labels, but their high dimensionality makes them computationally costly to store and process, and unsuitable for rigorous probabilistic inference. Sparse feature-based representations avoid these problems, but capture only partial scene information and are mainly useful for localisation only. We present a new compact but dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters. We are inspired by work both on learned depth from images, and auto-encoders. Our approach is suitable for use in a keyframe-based monocular dense SLAM system: While each keyframe with a code can produce a depth map, the code can be optimised efficiently jointly with pose variables and together with the codes of overlapping keyframes to attain global consistency. Conditioning the depth map on the image allows the code to only represent aspects of the local geometry which cannot directly be predicted from the image. We explain how to learn our code representation, and demonstrate its advantageous properties in monocular SLAM.
Abstract (translated)
实时三维感知系统中几何图形的表示仍然是一个重要的研究课题。密集的地图具有完整的表面形状,可以用语义标签进行扩充,但其高维性使其存储和处理的计算成本较高,不适合进行严格的概率推理。稀疏的基于特征的表示避免了这些问题,但只捕获部分场景信息,主要用于本地化。
URL
https://arxiv.org/abs/1804.00874