Paper Reading AI Learner

Learning H-Infinity Locomotion Control

2024-04-22 17:59:07
Junfeng Long, Wenye Yu, Quanyi Li, Zirui Wang, Dahua Lin, Jiangmiao Pang

Abstract

Stable locomotion in precipitous environments is an essential capability of quadruped robots, demanding the ability to resist various external disturbances. However, recent learning-based policies only use basic domain randomization to improve the robustness of learned policies, which cannot guarantee that the robot has adequate disturbance resistance capabilities. In this paper, we propose to model the learning process as an adversarial interaction between the actor and a newly introduced disturber and ensure their optimization with $H_{\infty}$ constraint. In contrast to the actor that maximizes the discounted overall reward, the disturber is responsible for generating effective external forces and is optimized by maximizing the error between the task reward and its oracle, i.e., "cost" in each iteration. To keep joint optimization between the actor and the disturber stable, our $H_{\infty}$ constraint mandates the bound of ratio between the cost to the intensity of the external forces. Through reciprocal interaction throughout the training phase, the actor can acquire the capability to navigate increasingly complex physical disturbances. We verify the robustness of our approach on quadrupedal locomotion tasks with Unitree Aliengo robot, and also a more challenging task with Unitree A1 robot, where the quadruped is expected to perform locomotion merely on its hind legs as if it is a bipedal robot. The simulated quantitative results show improvement against baselines, demonstrating the effectiveness of the method and each design choice. On the other hand, real-robot experiments qualitatively exhibit how robust the policy is when interfering with various disturbances on various terrains, including stairs, high platforms, slopes, and slippery terrains. All code, checkpoints, and real-world deployment guidance will be made public.

Abstract (translated)

在险峻环境中实现稳定的运动是四足机器人的关键能力,要求具有抵抗各种外部干扰的能力。然而,最近基于学习的策略仅使用基本的领域随机化来提高学到的策略的鲁棒性,这不能保证机器人具有足够的干扰抵抗能力。在本文中,我们将建模学习过程为演员与一个新引入的干扰器之间的对抗交互,并通过$H_{\infty}$约束确保它们的优化。与最大化累计奖励的演员不同,干扰器负责生成有效的外部力,并通过最大化任务奖励与其预言之间的误差来优化,即“成本”在每个迭代中。为了保持演员和干扰器之间的联合优化稳定,我们的$H_{\infty}$约束要求外力成本与强度之间的比值的边界。在训练过程中通过相互交互,演员可以获得在 increasingly复杂的物理干扰中航行的能力。我们在 Unitree Aliengo 机器人上验证了我们的方法的有效性,还使用 Unitree A1 机器人进行了一个更具有挑战性的任务,其中假设四足机器人仅在腿上进行运动,就像它是一个双足机器人一样。模拟的定量结果表明,相对于基线,我们的方法取得了改善,证明了这种方法和每个设计选择的有效性。另一方面,实机实验通过干扰各种地形对策略的鲁棒性进行了定性评估。所有代码、检查点和实机部署指南都将公开发布。

URL

https://arxiv.org/abs/2404.14405

PDF

https://arxiv.org/pdf/2404.14405.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot