MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

Abstract
Abstract (translated)
URL
PDF

Abstract

Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: this https URL

Abstract (translated)

同时定位和映射对于位置跟踪和场景理解是至关重要的。基于3D高斯的高质量地图表示允许使用多个姿态相机进行照片现实主义重建和实时渲染场景。我们首次证明了使用带有未姿态相机图像和惯性测量的3D高斯进行地图表示可以实现准确的位置跟踪。我们的方法MM3DGS通过使渲染更快、具有更好的缩放感知和轨迹跟踪解决了先前的神经辐射场表示器的限制。我们的框架通过使用包含预积分惯性测量、深度估计和光量渲染质量测量的相对姿态变换的损失函数实现了基于关键帧的映射和跟踪。我们还发布了来自配备摄像头和惯性测量单元的移动机器人上的多模态数据集UT-MM。实验评估了数据集中的多个场景，结果显示MM3DGS在跟踪和光量渲染质量方面比当前3DGS SLAM状态提高了3倍，同时允许对高分辨率密集3D地图进行实时渲染。网站链接：https:// this URL

URL

https://arxiv.org/abs/2404.00923

PDF

https://arxiv.org/pdf/2404.00923.pdf

MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements

Abstract

Abstract (translated)

URL

PDF Copy

PDF