Paper Reading AI Learner

RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

2024-04-30 16:54:59
Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

Abstract

We propose RTG-SLAM, a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. RTG-SLAM features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of real large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.

Abstract (translated)

我们提出了RTG-SLAM,一种基于Gaussian分割的大规模环境下的实时3D重建系统。RTG-SLAM具有紧凑的Gaussian表示和高效的on-the-fly Gaussian优化方案。我们强制每个Gaussian要么是透明的,要么是几乎透明的,其中透明的Gaussian适合于表面和主导颜色,而透明的Gaussian适合于残余颜色。通过以与颜色渲染不同的方式渲染深度,我们使得一个透明的Gaussian可以适应用户本地表面区域,而无需多个重叠的Gaussian,从而大大降低了内存和计算成本。 对于on-the-fly Gaussian优化,我们明确地添加了每帧三种不同类型的像素的Gaussian:新观察到的,具有大的颜色误差和大的深度误差。我们还将所有Gaussian分为稳定和不稳定两类,其中稳定Gaussian预计将很好地适应用户之前观察到的RGBD图像,而其他Gaussian则是不稳定的。我们仅优化不稳定Gaussian,并仅渲染稳定Gaussian占用的像素。 通过这种方式,Gaussians要优化的数量和需要渲染的像素数量都大大减少,优化可以在实时过程中进行。我们展示了各种真实大场景的实时重构。与基于NeRF的RGBD SLAM的状态相比,我们的系统在质量和高速度方面具有相似的表现,同时将速度和内存成本降低约一半,并在新颖视图合成和相机跟踪精度的现实性方面具有卓越的表现。

URL

https://arxiv.org/abs/2404.19706

PDF

https://arxiv.org/pdf/2404.19706.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot