Abstract
Towards the aim of generalized robotic manipulation, spatial generalization is the most fundamental capability that requires the policy to work robustly under different spatial distribution of objects, environment and agent itself. To achieve this, substantial human demonstrations need to be collected to cover different spatial configurations for training a generalized visuomotor policy via imitation learning. Prior works explore a promising direction that leverages data generation to acquire abundant spatially diverse data from minimal source demonstrations. However, most approaches face significant sim-to-real gap and are often limited to constrained settings, such as fixed-base scenarios and predefined camera viewpoints. In this paper, we propose a real-to-real 3D data generation framework (R2RGen) that directly augments the pointcloud observation-action pairs to generate real-world data. R2RGen is simulator- and rendering-free, thus being efficient and plug-and-play. Specifically, given a single source demonstration, we introduce an annotation mechanism for fine-grained parsing of scene and trajectory. A group-wise augmentation strategy is proposed to handle complex multi-object compositions and diverse task constraints. We further present camera-aware processing to align the distribution of generated data with real-world 3D sensor. Empirically, R2RGen substantially enhances data efficiency on extensive experiments and demonstrates strong potential for scaling and application on mobile manipulation.
Abstract (translated)
为了实现通用机器人操作的目标,空间泛化是最基本的能力之一。这要求策略能够在不同物体、环境和自身位置的空间分布下稳健运行。为了达到这一目标,需要收集大量的人类演示数据,以涵盖不同的空间配置,通过模仿学习来训练一个能够广泛适应的视觉-运动政策。先前的研究探索了一种有前景的方向,即利用数据生成从少量源演示中获得大量的、具有多样性的空间数据。然而,大多数方法面临着显著的仿真到现实差距,并且通常局限于特定设置,如固定基座场景和预定义的摄像机视角。 本文提出了一种名为“R2RGen”的真实世界到真实世界的3D数据生成框架,它可以不依赖于模拟器或渲染技术直接扩充点云观察-动作对来生成实际世界的数据。该方法高效且易于集成使用。具体而言,给定单一源演示后,我们引入了注释机制以进行场景和轨迹的细粒度解析。为了处理复杂的多物体组合及多样化的任务约束,还提出了一种组增强策略。此外,我们还展示了基于摄像机感知的数据处理方法来对齐生成数据与实际3D传感器所采集数据的分布。 在广泛的实验中,R2RGen显著提高了数据效率,并且对于移动操作扩展和应用具有强大的潜力。
URL
https://arxiv.org/abs/2510.08547