Paper Reading AI Learner

Improving Novel view synthesis of 360$^circ$ Scenes in Extremely Sparse Views by Jointly Training Hemisphere Sampled Synthetic Images

2025-05-25 18:42:34
Guangan Chen, Anh Minh Truong, Hanhe Lin, Michiel Vlaminck, Wilfried Philips, Hiep Luong

Abstract

Novel view synthesis in 360$^\circ$ scenes from extremely sparse input views is essential for applications like virtual reality and augmented reality. This paper presents a novel framework for novel view synthesis in extremely sparse-view cases. As typical structure-from-motion methods are unable to estimate camera poses in extremely sparse-view cases, we apply DUSt3R to estimate camera poses and generate a dense point cloud. Using the poses of estimated cameras, we densely sample additional views from the upper hemisphere space of the scenes, from which we render synthetic images together with the point cloud. Training 3D Gaussian Splatting model on a combination of reference images from sparse views and densely sampled synthetic images allows a larger scene coverage in 3D space, addressing the overfitting challenge due to the limited input in sparse-view cases. Retraining a diffusion-based image enhancement model on our created dataset, we further improve the quality of the point-cloud-rendered images by removing artifacts. We compare our framework with benchmark methods in cases of only four input views, demonstrating significant improvement in novel view synthesis under extremely sparse-view conditions for 360$^\circ$ scenes.

Abstract (translated)

在虚拟现实和增强现实中,从极其稀疏的输入视图合成360°场景的新视角是至关重要的。本文提出了一种用于极稀疏视图情况下新视角合成的新型框架。 传统的基于结构从运动(Structure-from-Motion, SfM)的方法无法估计极端稀疏视图情况下的相机姿态,为此我们应用了DUSt3R方法来估算相机的姿态并生成密集点云。利用已估计算法的相机位置,我们在场景的上半球空间内密集采样额外视角,并与点云一起渲染合成图像。 通过在来自稀疏视图的参考图像和密集采样的合成图像组合上训练三维高斯光栅(3D Gaussian Splatting)模型,可以在三维空间中实现更广泛的场景覆盖,从而解决由于输入数据不足而导致极稀疏视图情况下的过拟合挑战。通过重新训练基于扩散的方法来改进我们创建的数据集上的图像质量,并进一步消除由点云渲染产生的伪影。 在仅有四个输入视角的情况下,我们将我们的框架与基准方法进行了比较,在极端稀疏的360°场景新视角合成方面显示出显著改善。

URL

https://arxiv.org/abs/2505.19264

PDF

https://arxiv.org/pdf/2505.19264.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot