Abstract
One of the biggest challenges in single-view 3D shape reconstruction in the wild is the scarcity of <3D shape, 2D image>-paired data from real-world environments. Inspired by remarkable achievements via domain randomization, we propose ObjectDR which synthesizes such paired data via a random simulation of visual variations in object appearances and backgrounds. Our data synthesis framework exploits a conditional generative model (e.g., ControlNet) to generate images conforming to spatial conditions such as 2.5D sketches, which are obtainable through a rendering process of 3D shapes from object collections (e.g., Objaverse-XL). To simulate diverse variations while preserving object silhouettes embedded in spatial conditions, we also introduce a disentangled framework which leverages an initial object guidance. After synthesizing a wide range of data, we pre-train a model on them so that it learns to capture a domain-invariant geometry prior which is consistent across various domains. We validate its effectiveness by substantially improving 3D shape reconstruction models on a real-world benchmark. In a scale-up evaluation, our pre-training achieves 23.6% superior results compared with the pre-training on high-quality computer graphics renderings.
Abstract (translated)
在野外单视3D形状重建的一个重大挑战是来自现实环境中的<3D形状, 2D图像>-对对数据非常有限。受到领域随机化技术的启示,我们提出了ObjectDR,它通过对象外观和背景的随机模拟来合成这样的对对数据。我们的数据合成框架利用了条件生成模型(如ControlNet)生成符合空间条件的图像,这些图像是通过从对象集合中渲染3D形状获得的(例如,Objaverse-XL)。为了在保留嵌入在空间条件中的对象轮廓的同时模拟多样变化,我们还引入了一个解耦框架,它利用了初始对象指导。在合成广泛的數據之后,我们在它们上预训练模型,使它学会捕捉跨多个领域的领域不变的幾何。我们通过在真实世界基准上显著提高3D形状重建模型的效果来验证其有效性。在扩展评估中,我们在高质量计算机图形渲染上的预训练实现了23.6%的优越性。
URL
https://arxiv.org/abs/2403.14539