Abstract
Current Simultaneous Localization and Mapping (SLAM) methods based on Neural Radiance Fields (NeRF) or 3D Gaussian Splatting excel in reconstructing static 3D scenes but struggle with tracking and reconstruction in dynamic environments, such as real-world scenes with moving elements. Existing NeRF-based SLAM approaches addressing dynamic challenges typically rely on RGB-D inputs, with few methods accommodating pure RGB input. To overcome these limitations, we propose Dy3DGS-SLAM, the first 3D Gaussian Splatting (3DGS) SLAM method for dynamic scenes using monocular RGB input. To address dynamic interference, we fuse optical flow masks and depth masks through a probabilistic model to obtain a fused dynamic mask. With only a single network iteration, this can constrain tracking scales and refine rendered geometry. Based on the fused dynamic mask, we designed a novel motion loss to constrain the pose estimation network for tracking. In mapping, we use the rendering loss of dynamic pixels, color, and depth to eliminate transient interference and occlusion caused by dynamic objects. Experimental results demonstrate that Dy3DGS-SLAM achieves state-of-the-art tracking and rendering in dynamic environments, outperforming or matching existing RGB-D methods.
Abstract (translated)
当前基于神经辐射场(NeRF)或三维高斯点阵(3D Gaussian Splatting)的同步定位与建图(SLAM)方法在重建静态三维场景方面表现出色,但在处理包含移动元素的真实世界动态环境时却显得力不从心。现有的针对动态挑战进行优化的基于NeRF的SLAM方法大多依赖于RGB-D输入,而能够适应纯RGB输入的方法则相对较少。为了克服这些限制,我们提出了Dy3DGS-SLAM,这是首个专门用于处理动态场景并采用单目RGB输入的三维高斯点阵(3DGS)SLAM方法。 为了解决动态干扰问题,我们通过概率模型融合光流掩码和深度掩码以获得一个融合后的动态掩码。仅需一次网络迭代即可借助这一融合后的动态掩码限制跟踪尺度并优化渲染几何形状。在设计约束姿态估计网络进行跟踪的新型运动损失时,我们亦基于该融合后的动态掩码进行了创新。 在建图过程中,我们将动像素的渲染损失、颜色和深度相结合以消除由动态对象引起的瞬态干扰与遮挡问题。实验结果表明,在处理动态环境时,Dy3DGS-SLAM不仅能够实现最先进的跟踪与渲染性能,并且超越或匹敌现有的RGB-D方法。
URL
https://arxiv.org/abs/2506.05965