Abstract
We propose PyTorchGeoNodes, a differentiable module for reconstructing 3D objects from images using interpretable shape programs. In comparison to traditional CAD model retrieval methods, the use of shape programs for 3D reconstruction allows for reasoning about the semantic properties of reconstructed objects, editing, low memory footprint, etc. However, the utilization of shape programs for 3D scene understanding has been largely neglected in past works. As our main contribution, we enable gradient-based optimization by introducing a module that translates shape programs designed in Blender, for example, into efficient PyTorch code. We also provide a method that relies on PyTorchGeoNodes and is inspired by Monte Carlo Tree Search (MCTS) to jointly optimize discrete and continuous parameters of shape programs and reconstruct 3D objects for input scenes. In our experiments, we apply our algorithm to reconstruct 3D objects in the ScanNet dataset and evaluate our results against CAD model retrieval-based reconstructions. Our experiments indicate that our reconstructions match well the input scenes while enabling semantic reasoning about reconstructed objects.
Abstract (translated)
我们提出了PyTorchGeoNodes,一种用于从图像中重构3D物体的可导形状程序模块。与传统的CAD模型检索方法相比,使用形状程序进行3D建模允许关于重构物体语义属性的推理、编辑、低内存足迹等。然而,在过去的 works中,对3D场景理解的形状程序的利用被大大忽视了。作为我们的主要贡献,我们通过引入一个将Blender中设计的形状程序翻译为高效PyTorch代码的模块,实现了基于梯度的优化。我们还提供了一种基于PyTorchGeoNodes的方法,该方法受到Monte Carlo Tree Search(MCTS)的启发,以共同优化形状程序和重构3D物体。在我们的实验中,我们将我们的算法应用于ScanNet数据集中的3D物体,并使用基于CAD模型检索的重建结果对其进行评估。我们的实验结果表明,我们的重构物与输入场景非常吻合,同时允许对重构物进行语义推理。
URL
https://arxiv.org/abs/2404.10620