Paper Reading AI Learner

SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images

2025-01-08 18:52:03
Zixuan Huang, Mark Boss, Aaryaman Vasishta, James M. Rehg, Varun Jampani

Abstract

We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: this https URL

Abstract (translated)

我们研究了单幅图像三维物体重建的问题。最近的工作主要分为两个方向:基于回归的方法和生成式方法。基于回归的方法能够高效地推断可见表面,但对于遮挡区域的处理效果较差。而生成式方法通过建模分布来更好地处理不确定区域,但计算成本较高,并且生成的结果通常与可见表面不一致。在本文中,我们提出了SPAR3D,这是一种创新性的两阶段方法,旨在结合两种方向的优势。 SPAR3D的第一阶段使用轻量级的点扩散模型生成稀疏的三维点云,具有快速采样的特点。第二阶段则利用采样得到的点云和输入图像创建高度详细的网格模型。我们的两阶段设计能够对单幅图像的三维重建任务进行概率建模的同时保持高效的计算效率并提供高质量的输出结果。此外,采用点云作为中间表示还允许用户进行交互式的编辑。 在多个不同数据集上的评估表明,SPAR3D的表现优于现有的最先进方法,并且在推理速度上仅需0.7秒。项目的网页包含代码和模型:[此链接](this https URL)

URL

https://arxiv.org/abs/2501.04689

PDF

https://arxiv.org/pdf/2501.04689.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot