Paper Reading AI Learner

Object-Centric Domain Randomization for 3D Shape Reconstruction in the Wild

2024-03-21 16:40:10
Junhyeong Cho, Kim Youwang, Hunmin Yang, Tae-Hyun Oh

Abstract

One of the biggest challenges in single-view 3D shape reconstruction in the wild is the scarcity of <3D shape, 2D image>-paired data from real-world environments. Inspired by remarkable achievements via domain randomization, we propose ObjectDR which synthesizes such paired data via a random simulation of visual variations in object appearances and backgrounds. Our data synthesis framework exploits a conditional generative model (e.g., ControlNet) to generate images conforming to spatial conditions such as 2.5D sketches, which are obtainable through a rendering process of 3D shapes from object collections (e.g., Objaverse-XL). To simulate diverse variations while preserving object silhouettes embedded in spatial conditions, we also introduce a disentangled framework which leverages an initial object guidance. After synthesizing a wide range of data, we pre-train a model on them so that it learns to capture a domain-invariant geometry prior which is consistent across various domains. We validate its effectiveness by substantially improving 3D shape reconstruction models on a real-world benchmark. In a scale-up evaluation, our pre-training achieves 23.6% superior results compared with the pre-training on high-quality computer graphics renderings.

Abstract (translated)

在野外单视3D形状重建的一个重大挑战是来自现实环境中的<3D形状, 2D图像>-对对数据非常有限。受到领域随机化技术的启示,我们提出了ObjectDR,它通过对象外观和背景的随机模拟来合成这样的对对数据。我们的数据合成框架利用了条件生成模型(如ControlNet)生成符合空间条件的图像,这些图像是通过从对象集合中渲染3D形状获得的(例如,Objaverse-XL)。为了在保留嵌入在空间条件中的对象轮廓的同时模拟多样变化,我们还引入了一个解耦框架,它利用了初始对象指导。在合成广泛的數據之后,我们在它们上预训练模型,使它学会捕捉跨多个领域的领域不变的幾何。我们通过在真实世界基准上显著提高3D形状重建模型的效果来验证其有效性。在扩展评估中,我们在高质量计算机图形渲染上的预训练实现了23.6%的优越性。

URL

https://arxiv.org/abs/2403.14539

PDF

https://arxiv.org/pdf/2403.14539.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot