Paper Reading AI Learner

Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving

2024-03-27 02:41:52
Xuemin Hu, Pan Chen, Yijun Wen, Bo Tang, Long Chen

Abstract

Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training objective, but they still permit unsafe state occurrence, which is unacceptable in autonomous driving tasks. Moreover, these methods are difficult to achieve a balance between the cost and return expectations, which leads to learning performance degradation for the algorithms. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The short-term constraint aims to guarantee the short-term state safety that the vehicle explores, while the long-term constraint ensures the overall safety of the vehicle throughout the decision-making process. In addition, we develop a safe RL method with dual-constraint optimization based on the Lagrange multiplier to optimize the training process for end-to-end autonomous driving. Comprehensive experiments were conducted on the MetaDrive simulator. Experimental results demonstrate that the proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks compared with state-of-the-art methods.

Abstract (translated)

强化学习(RL)在决策任务中得到了广泛应用,但由于与环境的交互要求,在训练过程中无法保证智能体的安全性,这严重限制了其在自动驾驶等领域的工业应用。为了处理这个问题,人们发展了一些安全RL方法,通过将预期安全违规成本作为训练目标来约束,但这些方法仍然允许不安全的状态发生,这是在自动驾驶任务中不可接受的。此外,这些方法很难在成本和回报期望之间实现平衡,导致算法的学习性能下降。在本文中,我们提出了一种基于长短期约束(LSTC)的安全RL新算法。短期约束旨在确保车辆探索过程中短期的安全性,而长期约束确保了车辆在决策过程中整个安全性。此外,我们基于拉格朗日乘数开发了一种安全RL方法,用于优化端到端自动驾驶的训练过程。在元驱动仿真器上进行了全面的实验。实验结果表明,与最先进的方法相比,所提出的方法在连续状态和动作任务中实现了更高的安全性,并且在长途决策任务中表现出更高的探索性能。

URL

https://arxiv.org/abs/2403.18209

PDF

https://arxiv.org/pdf/2403.18209.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot