NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising

Abstract
Abstract (translated)
URL
PDF

Abstract

In recent years, there have been significant advancements in 3D reconstruction and dense RGB-D SLAM systems. One notable development is the application of Neural Radiance Fields (NeRF) in these systems, which utilizes implicit neural representation to encode 3D scenes. This extension of NeRF to SLAM has shown promising results. However, the depth images obtained from consumer-grade RGB-D sensors are often sparse and noisy, which poses significant challenges for 3D reconstruction and affects the accuracy of the representation of the scene geometry. Moreover, the original hierarchical feature grid with occupancy value is inaccurate for scene geometry representation. Furthermore, the existing methods select random pixels for camera tracking, which leads to inaccurate localization and is not robust in real-world indoor environments. To this end, we present NeSLAM, an advanced framework that achieves accurate and dense depth estimation, robust camera tracking, and realistic synthesis of novel views. First, a depth completion and denoising network is designed to provide dense geometry prior and guide the neural implicit representation optimization. Second, the occupancy scene representation is replaced with Signed Distance Field (SDF) hierarchical scene representation for high-quality reconstruction and view synthesis. Furthermore, we also propose a NeRF-based self-supervised feature tracking algorithm for robust real-time tracking. Experiments on various indoor datasets demonstrate the effectiveness and accuracy of the system in reconstruction, tracking quality, and novel view synthesis.

Abstract (translated)

近年来，3D建模和密集RGB-D SLAM系统取得了显著的进展。一个值得注意的是，将神经辐射场（NeRF）应用于这些系统，利用隐式神经表示来编码3D场景。这种将NeRF扩展到SLAM系统的方法已经取得了良好的效果。然而，消费级RGB-D传感器获得的深度图像通常稀疏且噪声严重，这使得3D建模和场景几何表示的准确性面临重大挑战。此外，原始分层特征网格的占有值值也不准确，并且现有的方法选择随机像素进行相机跟踪，导致不准确的局部定位，在现实世界的室内环境中也不够稳健。因此，我们提出了NeSLAM，一种实现准确和密集深度估计、稳健相机跟踪和真实场景合成的高性能框架。首先，设计了一个深度完成和去噪网络，以提供丰富的几何信息并指导神经隐式表示优化。其次，用符号距离场（SDF）层次场景表示来替代填充场景表示，以实现高质量的重构和视图合成。此外，我们还提出了一种基于NeRF的自监督特征跟踪算法，用于实时跟踪。在各种室内数据集上进行的实验证明了这个系统在建模、跟踪质量和场景合成方面的有效性和准确性。

URL

https://arxiv.org/abs/2403.20034

PDF

https://arxiv.org/pdf/2403.20034.pdf

NeSLAM: Neural Implicit Mapping and Self-Supervised Feature Tracking With Depth Completion and Denoising

Abstract

Abstract (translated)

URL

PDF Copy

PDF