Paper Reading AI Learner

DiffuseLoco: Real-Time Legged Locomotion Control with Diffusion from Offline Datasets

2024-04-30 05:10:59
Xiaoyu Huang, Yufeng Chi, Ruofeng Wang, Zhongyu Li, Xue Bin Peng, Sophia Shao, Borivoje Nikolic, Koushil Sreenath

Abstract

This work introduces DiffuseLoco, a framework for training multi-skill diffusion-based policies for dynamic legged locomotion from offline datasets, enabling real-time control of diverse skills on robots in the real world. Offline learning at scale has led to breakthroughs in computer vision, natural language processing, and robotic manipulation domains. However, scaling up learning for legged robot locomotion, especially with multiple skills in a single policy, presents significant challenges for prior online reinforcement learning methods. To address this challenge, we propose a novel, scalable framework that leverages diffusion models to directly learn from offline multimodal datasets with a diverse set of locomotion skills. With design choices tailored for real-time control in dynamical systems, including receding horizon control and delayed inputs, DiffuseLoco is capable of reproducing multimodality in performing various locomotion skills, zero-shot transfer to real quadrupedal robots, and it can be deployed on edge computing devices. Furthermore, DiffuseLoco demonstrates free transitions between skills and robustness against environmental variations. Through extensive benchmarking in real-world experiments, DiffuseLoco exhibits better stability and velocity tracking performance compared to prior reinforcement learning and non-diffusion-based behavior cloning baselines. The design choices are validated via comprehensive ablation studies. This work opens new possibilities for scaling up learning-based legged locomotion controllers through the scaling of large, expressive models and diverse offline datasets.

Abstract (translated)

本文介绍了一种名为DiffuseLoco的多技能扩散模型的训练框架,用于从离线数据中训练多技能动态腿部运动策略,实现对现实世界中机器人的实时控制。大规模的离线学习在计算机视觉、自然语言处理和机器人操作领域取得了突破。然而,对于具有单一策略的机器人运动控制,尤其是在多个技能的情况下,扩展学习带来了巨大的挑战,对于先前的在线强化学习方法而言。为解决这个问题,我们提出了一个新型的、可扩展的框架,它利用扩散模型从离线多模态数据中直接学习,具有多样化的运动技能。通过针对动态系统进行设计的决策,包括后退视野控制和延迟输入,DiffuseLoco能够复制各种运动技能,实现零散地将机器人转移到真实四足机器人,并且可以部署在边缘计算设备上。此外,DiffuseLoco展示了技能之间的自由转换和对抗环境变化的能力。通过在现实世界实验中进行广泛的基准测试,DiffuseLoco与先前的强化学习和基于非扩散模型的行为克隆基线相比,表现出更好的稳定性和速度跟踪性能。通过全面的消融分析验证了设计选择。这项工作为通过扩展基于学习的机器人运动控制器打开了新的可能性,通过扩展大型、表现力强的模型和多样化的离线数据。

URL

https://arxiv.org/abs/2404.19264

PDF

https://arxiv.org/pdf/2404.19264.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot