Abstract
Recent advancements in reinforcement learning (RL) demonstrate the significant potential in autonomous driving. Despite this promise, challenges such as the manual design of reward functions and low sample efficiency in complex environments continue to impede the development of safe and effective driving policies. To tackle these issues, we introduce LearningFlow, an innovative automated policy learning workflow tailored to urban driving. This framework leverages the collaboration of multiple large language model (LLM) agents throughout the RL training process. LearningFlow includes a curriculum sequence generation process and a reward generation process, which work in tandem to guide the RL policy by generating tailored training curricula and reward functions. Particularly, each process is supported by an analysis agent that evaluates training progress and provides critical insights to the generation agent. Through the collaborative efforts of these LLM agents, LearningFlow automates policy learning across a series of complex driving tasks, and it significantly reduces the reliance on manual reward function design while enhancing sample efficiency. Comprehensive experiments are conducted in the high-fidelity CARLA simulator, along with comparisons with other existing methods, to demonstrate the efficacy of our proposed approach. The results demonstrate that LearningFlow excels in generating rewards and curricula. It also achieves superior performance and robust generalization across various driving tasks, as well as commendable adaptation to different RL algorithms.
Abstract (translated)
最近在强化学习(RL)领域的进展展示了其在自动驾驶中的巨大潜力。尽管前景广阔,但手动设计奖励函数和复杂环境下的低样本效率等问题仍然阻碍了安全有效的驾驶策略的发展。为解决这些问题,我们提出了LearningFlow,这是一种针对城市驾驶的创新自动化政策学习工作流。该框架利用多个大型语言模型(LLM)代理在整个RL训练过程中协作。 LearningFlow包括课程序列生成过程和奖励生成过程,这两个过程协同合作以通过定制培训课程和奖励函数来指导RL策略。特别地,每个过程都有一个分析代理评估培训进度并为生成代理提供关键见解。这些LLM代理的共同努力使LearningFlow能够在一系列复杂的驾驶任务中自动化政策学习,并且大大减少了对手动设计奖励功能的依赖,同时提高了样本效率。 在高保真的CARLA模拟器中进行了全面实验,并与其他现有方法进行了比较,以展示我们提出的方法的有效性。结果表明,LearningFlow在生成奖励和课程方面表现出色。它还在各种驾驶任务上实现了卓越的表现和强大的泛化能力,并且能够适应不同的RL算法。
URL
https://arxiv.org/abs/2501.05057