Paper Reading AI Learner

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

2024-10-29 15:28:15
Sunghwan Hong, Jaewoo Jung, Heeseong Shin, Jisang Han, Jiaolong Yang, Chong Luo, Seungryong Kim

Abstract

We consider the problem of novel view synthesis from unposed images in a single feed-forward. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS, where we further extend it to offer a practical solution that relaxes common assumptions such as dense image views, accurate camera poses, and substantial image overlaps. We achieve this through identifying and addressing unique challenges arising from the use of pixel-aligned 3DGS: misaligned 3D Gaussians across different views induce noisy or sparse gradients that destabilize training and hinder convergence, especially when above assumptions are not met. To mitigate this, we employ pre-trained monocular depth estimation and visual correspondence models to achieve coarse alignments of 3D Gaussians. We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis. Furthermore, the refined estimates are leveraged to estimate geometry confidence scores, which assess the reliability of 3D Gaussian centers and condition the prediction of Gaussian parameters accordingly. Extensive evaluations on large-scale real-world datasets demonstrate that PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.

Abstract (translated)

我们考虑从未对齐图像中通过单一前向传递进行新视角合成的问题。我们的框架利用了3DGS在快速速度、可扩展性和高质量的三维重建及视图合成能力方面的优势,并在此基础上进一步扩展,提供了一种实用解决方案,放宽了诸如密集图像视图、精确相机姿态和显著图像重叠等常见假设。我们通过识别并解决使用像素对齐3DGS时出现的独特挑战来实现这一目标:不同视角间未对齐的3D高斯分布会导致噪声或稀疏梯度,这会破坏训练过程并阻碍收敛,特别是在上述假设不成立的情况下。为减轻这一点,我们采用预训练的单目深度估计和视觉对应模型来实现3D高斯分布的粗略对齐。随后,我们引入轻量级、可学习模块以从粗略对齐中改进深度和姿态估计,从而提升三维重建和新视角合成的质量。此外,利用改进后的估计值评估几何置信度得分,该得分用于评价3D高斯中心的可靠性,并相应地调整高斯参数预测。在大规模现实世界数据集上的广泛评估表明,PF3plat在所有基准测试中均设立了新的最佳性能标准,这一结论得到了详尽消融研究的支持,验证了我们的设计选择。

URL

https://arxiv.org/abs/2410.22128

PDF

https://arxiv.org/pdf/2410.22128.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot