Abstract
In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utilizes an encoder-decoder framework which generates 3D Gaussians in decoder with the guidance of depth-aware image features from encoder. In particular, we introduce the use of deformable transformer, allowing efficient and effective decoding through 3D reference point and multi-layer refinement adaptations. By harnessing the benefits of 3D Gaussians, our approach offers an efficient and accurate solution for 3D reconstruction from single-view images. We evaluate our method on the ShapeNet SRN dataset, getting PSNR of 24.21 and 24.98 in car and chair dataset, respectively. The result outperforming the recent method by around 2.25%, demonstrating the effectiveness of our method in achieving superior results.
Abstract (translated)
在本文中,我们研究了从单视RGB图像中进行3D重建的问题,并提出了名为DIG3D的三维物体重建和新颖视图合成方法。我们的方法利用了一个编码器-解码器框架,在编码器的指导下生成3D高斯分布。特别地,我们引入了形变Transformer,通过3D参考点和多层精细修复适应来实现高效的解码。通过利用3D高斯分布的优势,我们的方法为从单视图像中进行3D重建提供了有效且准确的方法。我们在ShapeNet SRN数据集上评估我们的方法,得到汽车和椅子数据集的PSNR分别为24.21和24.98。该结果比最近的方法约领先2.25%,证明了我们在实现卓越结果方面的有效性。
URL
https://arxiv.org/abs/2404.16323