Paper Reading AI Learner

Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering

2023-11-30 13:53:50
Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, Li Zhang

Abstract

Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent representation learning with sparse training data, we introduce a novel flow-based temporal smoothing mechanism and a position-aware adaptive control strategy. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 50/6000-fold acceleration in training/rendering over the best alternative.

Abstract (translated)

建模动态、大尺度城市场景具有挑战性,因为它们具有高度复杂的几何结构和在空间和时间上的无约束动力学。先前的方法通常采用高级的建筑先验,分离静态和动态元素,导致其协同作用的捕捉效果往往不理想。为了应对这一挑战,我们提出了一个统一的表示模型,称为周期振动高斯(PVG)。PVG在高效3D高斯平铺技术的基础上,引入了周期振动为基础的时间动态。这一创新使得PVG能够优雅且均匀地表示动态城市场景中各种物体和元素的特点。为了通过稀疏训练数据增强时间一致性表示学习,我们引入了一种新的流体为基础的时间平滑机制和位置感知自控制策略。在Waymo Open Dataset和KITTI基准上进行的广泛实验证明,PVG在重构和生成新视图方面都超过了最先进的替代方法,特别是在动态和静态场景。值得注意的是,PVG在没有依赖于手动标记的对象边界框或昂贵的光学流估计的情况下实现这一目标。此外,PVG在训练/渲染过程中的加速表现出50/6000-倍的优势。

URL

https://arxiv.org/abs/2311.18561

PDF

https://arxiv.org/pdf/2311.18561.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot