Paper Reading AI Learner

BLAZER: Bootstrapping LLM-based Manipulation Agents with Zero-Shot Data Generation

2025-10-09 17:59:58
Rocktim Jyoti Das, Harsh Singh, Diana Turmakhan, Muhammad Abdullah Sohail, Mingfei Han, Preslav Nakov, Fabio Pizzati, Ivan Laptev

Abstract

Scaling data and models has played a pivotal role in the remarkable progress of computer vision and language. Inspired by these domains, recent efforts in robotics have similarly focused on scaling both data and model size to develop more generalizable and robust policies. However, unlike vision and language, robotics lacks access to internet-scale demonstrations across diverse robotic tasks and environments. As a result, the scale of existing datasets typically suffers from the need for manual data collection and curation. To address this problem, here we propose BLAZER, a framework that learns manipulation policies from automatically generated training data. We build on the zero-shot capabilities of LLM planners and automatically generate demonstrations for diverse manipulation tasks in simulation. Successful examples are then used to finetune an LLM and to improve its planning capabilities without human supervision. Notably, while BLAZER training requires access to the simulator's state, we demonstrate direct transfer of acquired skills to sensor-based manipulation. Through extensive experiments, we show BLAZER to significantly improve zero-shot manipulation in both simulated and real environments. Moreover, BLAZER improves on tasks outside of its training pool and enables downscaling of LLM models. Our code and data will be made publicly available on the project page.

Abstract (translated)

数据和模型的扩展在计算机视觉和语言领域的显著进步中扮演了关键角色。受这些领域启发,机器人学最近的努力也集中在扩大数据规模和模型大小上,以开发更具通用性和鲁棒性的策略。然而,与视觉和语言不同的是,机器人学缺乏跨多种任务和环境的大规模在线演示数据。因此,现有的数据集通常受限于手动采集和整理的需要。 为了解决这个问题,我们在此提出BLAZER框架,该框架从自动生成的训练数据中学习操作政策。BLAZER利用大型语言模型(LLM)规划器的零样本能力,并在模拟环境中自动生成多样化的操作任务演示。成功的例子随后被用来微调LLM并改进其规划能力,而无需人工监督。 值得注意的是,虽然BLAZER训练需要访问模拟器的状态信息,但我们展示了所获得技能直接转移到基于传感器的操作中。通过广泛的实验,我们证明了BLAZER在仿真和真实环境中的零样本操作性能有了显著的提升。此外,BLAZER还改善了其训练数据池之外的任务,并实现了LLM模型的小型化。我们的代码和数据将在项目页面上公开发布。 这段话概述了一个名为BLAZER的新框架,该框架旨在通过利用自动生成的数据来改进机器人的零样本操作能力,并展示了这种方法在提高机器人系统适应性和灵活性方面的潜力。

URL

https://arxiv.org/abs/2510.08572

PDF

https://arxiv.org/pdf/2510.08572.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot