Paper Reading AI Learner

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning

2024-10-31 17:48:45
Zhenyu Jiang, Yuqi Xie, Kevin Lin, Zhenjia Xu, Weikang Wan, Ajay Mandlekar, Linxi Fan, Yuke Zhu

Abstract

Imitation learning from human demonstrations is an effective means to teach robots manipulation skills. But data acquisition is a major bottleneck in applying this paradigm more broadly, due to the amount of cost and human effort involved. There has been significant interest in imitation learning for bimanual dexterous robots, like humanoids. Unfortunately, data collection is even more challenging here due to the challenges of simultaneously controlling multiple arms and multi-fingered hands. Automated data generation in simulation is a compelling, scalable alternative to fuel this need for data. To this end, we introduce DexMimicGen, a large-scale automated data generation system that synthesizes trajectories from a handful of human demonstrations for humanoid robots with dexterous hands. We present a collection of simulation environments in the setting of bimanual dexterous manipulation, spanning a range of manipulation behaviors and different requirements for coordination among the two arms. We generate 21K demos across these tasks from just 60 source human demos and study the effect of several data generation and policy learning decisions on agent performance. Finally, we present a real-to-sim-to-real pipeline and deploy it on a real-world humanoid can sorting task. Videos and more are at this https URL

Abstract (translated)

从人类演示中进行模仿学习是一种有效的教授机器人操作技能的方法。但是,数据获取是应用这一范式的一个主要瓶颈,因为这涉及大量的成本和人力投入。对于双臂灵巧机器人(如类人机器人)的模仿学习引起了广泛的兴趣。不幸的是,在这里收集数据更加具有挑战性,因为在同时控制多个手臂和多指手方面存在困难。在模拟中自动生成数据是一种令人信服且可扩展的替代方案,以满足对数据的需求。为此,我们介绍了DexMimicGen,这是一个大规模自动数据生成系统,可以从少量的人类演示中为配备灵巧手的类人机器人合成轨迹。我们在双臂灵巧操作的背景下介绍了一系列模拟环境,涵盖了多种操控行为以及两臂之间不同协调要求的情况。我们从60个源人类演示中生成了21K个示范,并研究了几种数据生成和策略学习决策对代理性能的影响。最后,我们提出了一条实-模-实管道,并将其部署在现实世界中的类人罐子分类任务上。更多视频和其他内容请参见此链接:[https URL]

URL

https://arxiv.org/abs/2410.24185

PDF

https://arxiv.org/pdf/2410.24185.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot