Paper Reading AI Learner

DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction

2024-09-03 17:58:03
Jenny Seidenschwarz, Qunjie Zhou, Bardienus Duisterhof, Deva Ramanan, Laura Leal-Taix\'e

Abstract

Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.

Abstract (translated)

重建场景和跟踪运动是硬币的两面。跟踪点允许进行几何重建[14],而几何重建动态场景允许在时间上跟踪点[24, 39]。后者的最近也被用于二维点跟踪,通过将跟踪直接进入三维来克服遮挡模糊[38]。然而,上述方法要么需要离线处理,要么需要多视角相机设置,这在现实世界的应用中(如机器人导航或混合现实)是不现实的。我们针对从无姿态的单目相机输入中进行在线2D和3D点跟踪的挑战,引入了动态在线单目重建(DynOMo)。我们利用3D高斯分块来以在线方式重构动态场景。我们的方法将3D高斯扩展到捕捉新的内容和物体运动,同时从单个RGB帧中估计相机运动。DynOMo通过通过鲁棒图像特征重建 emergence point trajectories 和新颖的相似度增强 regularization term 脱颖而出,无需要求任何对应级别的监督。这为单目无姿态相机进行在线点跟踪设置了第一个基准,实现了与现有方法相当的表现。我们希望激励社区继续推进在线点跟踪和重建,并将其应用扩展到各种现实世界的场景中。

URL

https://arxiv.org/abs/2409.02104

PDF

https://arxiv.org/pdf/2409.02104.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot