Tree-structured Policy Planning with Learned Behavior Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.

Abstract (translated)

无人驾驶车辆(AVs)需要在规划自身运动的同时,考虑相邻代理的多种模式行为。许多现有的轨迹规划师都在寻找一种能够同时满足所有可能的未来趋势的单一轨迹,而忽视了双向交互,从而可能导致过于保守的计划。政策规划是 ego 代理计划一个政策,以对周围环境的多种模式行为作出反应,这是一个有前途的方向,因为它可以考虑到 AV 和周围环境之间的行动-反应相互作用。然而, most existing Policy Planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high-quality轨迹。为了填补这一空缺,我们提出了 Tree Policy Planning(TPP),这是一个与先进的深度学习预测模型兼容的政策规划师,生成多个运动计划,并考虑 ego 代理对周围环境的影响。TPP 的关键思想是通过构建两个树结构来将连续优化问题转化为可处理离散的 MDP:一个 ego 轨迹树用于 ego 轨迹选择,一个场景树用于多模态 ego 条件环境预测。我们基于真实世界 nuScenes 数据集开展了闭循环模拟,并结果显示,TPP 可以扩展到真实的 AV 场景,并显著优于非政策基准。

URL

https://arxiv.org/abs/2301.11902

PDF

https://arxiv.org/pdf/2301.11902.pdf