Paper Reading AI Learner

JointSplat: Probabilistic Joint Flow-Depth Optimization for Sparse-View Gaussian Splatting

2025-06-04 12:04:40
Yang Xiao, Guoan Xu, Qiang Wu, Wenjing Jia

Abstract

Reconstructing 3D scenes from sparse viewpoints is a long-standing challenge with wide applications. Recent advances in feed-forward 3D Gaussian sparse-view reconstruction methods provide an efficient solution for real-time novel view synthesis by leveraging geometric priors learned from large-scale multi-view datasets and computing 3D Gaussian centers via back-projection. Despite offering strong geometric cues, both feed-forward multi-view depth estimation and flow-depth joint estimation face key limitations: the former suffers from mislocation and artifact issues in low-texture or repetitive regions, while the latter is prone to local noise and global inconsistency due to unreliable matches when ground-truth flow supervision is unavailable. To overcome this, we propose JointSplat, a unified framework that leverages the complementarity between optical flow and depth via a novel probabilistic optimization mechanism. Specifically, this pixel-level mechanism scales the information fusion between depth and flow based on the matching probability of optical flow during training. Building upon the above mechanism, we further propose a novel multi-view depth-consistency loss to leverage the reliability of supervision while suppressing misleading gradients in uncertain areas. Evaluated on RealEstate10K and ACID, JointSplat consistently outperforms state-of-the-art (SOTA) methods, demonstrating the effectiveness and robustness of our proposed probabilistic joint flow-depth optimization approach for high-fidelity sparse-view 3D reconstruction.

Abstract (translated)

从稀疏视角重建三维场景是一个具有广泛应用的长期挑战。近期,基于前馈的三维高斯稀疏视图重建方法取得了进展,通过利用大规模多视图数据集中学习到的几何先验,并计算通过反向投影获得的三维高斯中心,提供了一种实时新视角合成的有效解决方案。尽管这些方法提供了强大的几何线索,但前馈多视图深度估计和光流-深度联合估计都面临关键限制:前者在低纹理或重复区域容易出现定位错误和伪影问题;后者由于缺乏地面实况光流监督而导致不准确的匹配时,则会受到局部噪声和全局一致性差的影响。为克服这些问题,我们提出了JointSplat,这是一个统一框架,通过新颖的概率优化机制利用了光流与深度之间的互补性。具体来说,这种像素级别的机制根据训练过程中的光流匹配概率来调整深度和光流的信息融合规模。基于上述机制,我们进一步提出了一种新的多视图深度一致性损失方法,以利用监督的可靠性并抑制不确定区域中误导性的梯度影响。 在RealEstate10K和ACID数据集上的评估表明,JointSplat始终优于最先进的(SOTA)方法,这证明了我们的概率联合光流-深度优化方法对于高保真稀疏视图三维重建的有效性和鲁棒性。

URL

https://arxiv.org/abs/2506.03872

PDF

https://arxiv.org/pdf/2506.03872.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot