Abstract
Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.
Abstract (translated)
数据和模型的扩展在计算机视觉和语言领域的显著进步中扮演了关键角色。受这些领域启发,机器人学最近的努力也集中在扩大数据规模和模型大小上,以开发更具通用性和鲁棒性的策略。然而,与视觉和语言不同的是,机器人学缺乏跨多种任务和环境的大规模在线演示数据。因此,现有的数据集通常受限于手动采集和整理的需要。 为了解决这个问题,我们在此提出BLAZER框架,该框架从自动生成的训练数据中学习操作政策。BLAZER利用大型语言模型(LLM)规划器的零样本能力,并在模拟环境中自动生成多样化的操作任务演示。成功的例子随后被用来微调LLM并改进其规划能力,而无需人工监督。 值得注意的是,虽然BLAZER训练需要访问模拟器的状态信息,但我们展示了所获得技能直接转移到基于传感器的操作中。通过广泛的实验,我们证明了BLAZER在仿真和真实环境中的零样本操作性能有了显著的提升。此外,BLAZER还改善了其训练数据池之外的任务,并实现了LLM模型的小型化。我们的代码和数据将在项目页面上公开发布。 这段话概述了一个名为BLAZER的新框架,该框架旨在通过利用自动生成的数据来改进机器人的零样本操作能力,并展示了这种方法在提高机器人系统适应性和灵活性方面的潜力。
URL
https://arxiv.org/abs/2510.08572