Abstract
Constructing a physically realistic and accurately scaled simulated 3D world is crucial for the training and evaluation of embodied intelligence tasks. The diversity, realism, low cost accessibility and affordability of 3D data assets are critical for achieving generalization and scalability in embodied AI. However, most current embodied intelligence tasks still rely heavily on traditional 3D computer graphics assets manually created and annotated, which suffer from high production costs and limited realism. These limitations significantly hinder the scalability of data driven approaches. We present EmbodiedGen, a foundational platform for interactive 3D world generation. It enables the scalable generation of high-quality, controllable and photorealistic 3D assets with accurate physical properties and real-world scale in the Unified Robotics Description Format (URDF) at low cost. These assets can be directly imported into various physics simulation engines for fine-grained physical control, supporting downstream tasks in training and evaluation. EmbodiedGen is an easy-to-use, full-featured toolkit composed of six key modules: Image-to-3D, Text-to-3D, Texture Generation, Articulated Object Generation, Scene Generation and Layout Generation. EmbodiedGen generates diverse and interactive 3D worlds composed of generative 3D assets, leveraging generative AI to address the challenges of generalization and evaluation to the needs of embodied intelligence related research. Code is available at this https URL.
Abstract (translated)
构建一个物理上真实且比例准确的三维模拟世界对于具身智能任务的训练和评估至关重要。三维数据资产的多样性、现实性、低成本获取性和可负担性是实现具身人工智能中的泛化和规模扩展的关键因素。然而,目前大多数具身智能任务仍然严重依赖于传统的由人工创建和标注的3D计算机图形资源,这些资源面临着高昂的生产成本和有限的真实感问题。这些问题显著限制了数据驱动方法的可扩展性。 我们提出了一种名为EmbodiedGen的基础平台,该平台用于生成交互式三维世界。它能够在低成本的情况下,大规模地生成高质量、可控且高度逼真的3D资产,并具备准确的物理特性和实际世界的规模(在统一机器人描述格式URDF中)。这些资源可以直接导入到各种物理模拟引擎中以实现细微程度的物理控制,支持下游任务中的训练和评估工作。EmbodiedGen是一个易于使用的全功能工具包,由六个关键模块组成:Image-to-3D、Text-to-3D、Texture Generation(纹理生成)、Articulated Object Generation(连杆对象生成)、Scene Generation(场景生成)和Layout Generation(布局生成)。通过利用生成式AI技术,EmbodiedGen能够创建包含生成式3D资产的多样化且可交互的三维世界,以解决具身智能相关研究中泛化与评估方面的挑战。该代码可在提供的URL处获取。
URL
https://arxiv.org/abs/2506.10600