Paper Reading AI Learner

Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis

2024-10-30 08:51:29
Zhiyuan Min, Yawei Luo, Jianwen Sun, Yi Yang

Abstract

Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: this https URL.

Abstract (translated)

通用化的3D高斯分裂(3DGS)能够以前向推理的方式,从稀疏视角观察中重建新的场景,消除了传统3DGS所需针对特定场景重新训练的需要。然而,现有的方法很大程度上依赖于极线先验,在复杂的真实世界场景中,特别是在非重叠和遮挡区域,这些先验可能不可靠。在本文中,我们提出了eFreeSplat,这是一种高效的前向推理3DGS模型,适用于通用化的新型视图合成,并且独立于极线约束运行。为了增强具有三维感知的多视角特征提取,我们使用了带有大规模数据集上跨视角补全预训练的自监督视觉变换器(ViT)。此外,我们引入了一种迭代交叉视角高斯对齐方法来确保不同视图之间的深度比例一致。我们的eFreeSplat代表了一种创新的方法来进行通用化的新型视图合成。与现有的纯几何无关的方法不同,eFreeSplat更侧重于通过跨视角预训练提供的三维先验实现无极线的特征匹配和编码。我们在宽基线新型视图合成任务上使用RealEstate10K和ACID数据集对eFreeSplat进行了评估。大量的实验表明,与依赖于极线先验的最先进的基准相比,eFreeSplat在几何重建和新型视图合成功能方面都表现出色。项目页面:此 https URL。

URL

https://arxiv.org/abs/2410.22817

PDF

https://arxiv.org/pdf/2410.22817.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot