Abstract
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: this https URL
Abstract (translated)
我们研究了单幅图像三维物体重建的问题。最近的工作主要分为两个方向:基于回归的方法和生成式方法。基于回归的方法能够高效地推断可见表面,但对于遮挡区域的处理效果较差。而生成式方法通过建模分布来更好地处理不确定区域,但计算成本较高,并且生成的结果通常与可见表面不一致。在本文中,我们提出了SPAR3D,这是一种创新性的两阶段方法,旨在结合两种方向的优势。 SPAR3D的第一阶段使用轻量级的点扩散模型生成稀疏的三维点云,具有快速采样的特点。第二阶段则利用采样得到的点云和输入图像创建高度详细的网格模型。我们的两阶段设计能够对单幅图像的三维重建任务进行概率建模的同时保持高效的计算效率并提供高质量的输出结果。此外,采用点云作为中间表示还允许用户进行交互式的编辑。 在多个不同数据集上的评估表明,SPAR3D的表现优于现有的最先进方法,并且在推理速度上仅需0.7秒。项目的网页包含代码和模型:[此链接](this https URL)
URL
https://arxiv.org/abs/2501.04689