Paper Reading AI Learner

LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models

2025-01-09 08:28:16
Zengqi Peng, Yubin Wang, Xu Han, Lei Zheng, Jun Ma

Abstract

Recent advancements in reinforcement learning (RL) demonstrate the significant potential in autonomous driving. Despite this promise, challenges such as the manual design of reward functions and low sample efficiency in complex environments continue to impede the development of safe and effective driving policies. To tackle these issues, we introduce LearningFlow, an innovative automated policy learning workflow tailored to urban driving. This framework leverages the collaboration of multiple large language model (LLM) agents throughout the RL training process. LearningFlow includes a curriculum sequence generation process and a reward generation process, which work in tandem to guide the RL policy by generating tailored training curricula and reward functions. Particularly, each process is supported by an analysis agent that evaluates training progress and provides critical insights to the generation agent. Through the collaborative efforts of these LLM agents, LearningFlow automates policy learning across a series of complex driving tasks, and it significantly reduces the reliance on manual reward function design while enhancing sample efficiency. Comprehensive experiments are conducted in the high-fidelity CARLA simulator, along with comparisons with other existing methods, to demonstrate the efficacy of our proposed approach. The results demonstrate that LearningFlow excels in generating rewards and curricula. It also achieves superior performance and robust generalization across various driving tasks, as well as commendable adaptation to different RL algorithms.

Abstract (translated)

最近在强化学习(RL)领域的进展展示了其在自动驾驶中的巨大潜力。尽管前景广阔,但手动设计奖励函数和复杂环境下的低样本效率等问题仍然阻碍了安全有效的驾驶策略的发展。为解决这些问题,我们提出了LearningFlow,这是一种针对城市驾驶的创新自动化政策学习工作流。该框架利用多个大型语言模型(LLM)代理在整个RL训练过程中协作。 LearningFlow包括课程序列生成过程和奖励生成过程,这两个过程协同合作以通过定制培训课程和奖励函数来指导RL策略。特别地,每个过程都有一个分析代理评估培训进度并为生成代理提供关键见解。这些LLM代理的共同努力使LearningFlow能够在一系列复杂的驾驶任务中自动化政策学习,并且大大减少了对手动设计奖励功能的依赖,同时提高了样本效率。 在高保真的CARLA模拟器中进行了全面实验,并与其他现有方法进行了比较,以展示我们提出的方法的有效性。结果表明,LearningFlow在生成奖励和课程方面表现出色。它还在各种驾驶任务上实现了卓越的表现和强大的泛化能力,并且能够适应不同的RL算法。

URL

https://arxiv.org/abs/2501.05057

PDF

https://arxiv.org/pdf/2501.05057.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot