Abstract
Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.
Abstract (translated)
重建场景和跟踪运动是硬币的两面。跟踪点允许进行几何重建[14],而几何重建动态场景允许在时间上跟踪点[24, 39]。后者的最近也被用于二维点跟踪,通过将跟踪直接进入三维来克服遮挡模糊[38]。然而,上述方法要么需要离线处理,要么需要多视角相机设置,这在现实世界的应用中(如机器人导航或混合现实)是不现实的。我们针对从无姿态的单目相机输入中进行在线2D和3D点跟踪的挑战,引入了动态在线单目重建(DynOMo)。我们利用3D高斯分块来以在线方式重构动态场景。我们的方法将3D高斯扩展到捕捉新的内容和物体运动,同时从单个RGB帧中估计相机运动。DynOMo通过通过鲁棒图像特征重建 emergence point trajectories 和新颖的相似度增强 regularization term 脱颖而出,无需要求任何对应级别的监督。这为单目无姿态相机进行在线点跟踪设置了第一个基准,实现了与现有方法相当的表现。我们希望激励社区继续推进在线点跟踪和重建,并将其应用扩展到各种现实世界的场景中。
URL
https://arxiv.org/abs/2409.02104