Paper Reading AI Learner

RACER: Epistemic Risk-Sensitive RL Enables Fast Driving with Fewer Crashes

2024-05-07 23:32:36
Kyle Stachowicz, Sergey Levine

Abstract

Reinforcement learning provides an appealing framework for robotic control due to its ability to learn expressive policies purely through real-world interaction. However, this requires addressing real-world constraints and avoiding catastrophic failures during training, which might severely impede both learning progress and the performance of the final policy. In many robotics settings, this amounts to avoiding certain "unsafe" states. The high-speed off-road driving task represents a particularly challenging instantiation of this problem: a high-return policy should drive as aggressively and as quickly as possible, which often requires getting close to the edge of the set of "safe" states, and therefore places a particular burden on the method to avoid frequent failures. To both learn highly performant policies and avoid excessive failures, we propose a reinforcement learning framework that combines risk-sensitive control with an adaptive action space curriculum. Furthermore, we show that our risk-sensitive objective automatically avoids out-of-distribution states when equipped with an estimator for epistemic uncertainty. We implement our algorithm on a small-scale rally car and show that it is capable of learning high-speed policies for a real-world off-road driving task. We show that our method greatly reduces the number of safety violations during the training process, and actually leads to higher-performance policies in both driving and non-driving simulation environments with similar challenges.

Abstract (translated)

强化学习在机器人控制领域具有通过现实世界交互学习富有表现力的策略的吸引力。然而,这需要解决现实世界的约束,并在训练过程中避免灾难性故障,这可能会极大地阻碍学习和最终策略的性能。在许多机器人设置中,这等价于避免某些“不安全”的状态。高速越野驾驶任务代表了一个 particularly 困难的实例:高回报策略应该尽可能积极和快速地驱动,这往往需要接近“安全”状态的边缘,因此对方法避免频繁失败提出了特殊要求。为了同时学习高度有效的策略并避免过度故障,我们提出了一个结合风险敏感控制和自适应动作空间课程的强化学习框架。此外,我们还证明了我们的风险敏感目标在配备知识不确定性估计器时能够自动避免离散状态。我们在一个小型 rally car 上实现了我们的算法,并证明了它能够学会真实世界越野驾驶任务中的高速策略。我们证明了我们的方法在训练过程中大大减少了安全违规数量,并且实际上在具有类似挑战性的驾驶和非驾驶仿真环境中,都能获得更高性能的策略。

URL

https://arxiv.org/abs/2405.04714

PDF

https://arxiv.org/pdf/2405.04714.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot