6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction

Abstract
Abstract (translated)
URL
PDF

Abstract

Current 3D reconstruction techniques struggle to infer unbounded scenes from a few images faithfully. Specifically, existing methods have high computational demands, require detailed pose information, and cannot reconstruct occluded regions reliably. We introduce 6Img-to-3D, an efficient, scalable transformer-based encoder-renderer method for single-shot image to 3D reconstruction. Our method outputs a 3D-consistent parameterized triplane from only six outward-facing input images for large-scale, unbounded outdoor driving scenarios. We take a step towards resolving existing shortcomings by combining contracted custom cross- and self-attention mechanisms for triplane parameterization, differentiable volume rendering, scene contraction, and image feature projection. We showcase that six surround-view vehicle images from a single timestamp without global pose information are enough to reconstruct 360$^{\circ}$ scenes during inference time, taking 395 ms. Our method allows, for example, rendering third-person images and birds-eye views. Our code is available at this https URL, and more examples can be found at our website here this https URL.

Abstract (translated)

目前的三维重建技术很难从几张图像中忠实推断无限制的场景。具体来说，现有的方法具有高的计算需求，需要详细的姿态信息，并且无法可靠地重构遮挡区域。我们引入了6Img-to-3D，一种高效、可扩展的基于Transformer的单击图像到3D重建方法。我们的方法输出从仅六个外向 facing输入图像中得到的大规模无限制 outdoor driving 场景中的 3D 一致参数化三平面。我们通过结合收缩的自定义 cross- 和自注意机制来解决现有不足，实现不同纹理渲染、场景收缩和图像特征投影。我们证明了，在推理过程中仅使用一个时间戳的6个环绕视图车辆图像足以重构360$^{\circ}$的场景，需要395毫秒。我们的方法允许，例如，渲染三个人物图像和鸟瞰视图。我们的代码可以从此链接获得，更多例子可以在我们的网站 https://this.url 找到。

URL

https://arxiv.org/abs/2404.12378

PDF

https://arxiv.org/pdf/2404.12378.pdf

6Img-to-3D: Few-Image Large-Scale Outdoor Driving Scene Reconstruction

Abstract

Abstract (translated)

URL

PDF Copy

PDF