Paper Reading AI Learner

Tree-structured Policy Planning with Learned Behavior Models

2023-01-27 18:27:26
Yuxiao Chen, Peter Karkus, Boris Ivanovic, Xinshuo Weng, Marco Pavone

Abstract

Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.

Abstract (translated)

无人驾驶车辆(AVs)需要在规划自身运动的同时,考虑相邻代理的多种模式行为。许多现有的轨迹规划师都在寻找一种能够同时满足所有可能的未来趋势的单一轨迹,而忽视了双向交互,从而可能导致过于保守的计划。政策规划是 ego 代理计划一个政策,以对周围环境的多种模式行为作出反应,这是一个有前途的方向,因为它可以考虑到 AV 和周围环境之间的行动-反应相互作用。然而, most existing Policy Planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high-quality轨迹。为了填补这一空缺,我们提出了 Tree Policy Planning(TPP),这是一个与先进的深度学习预测模型兼容的政策规划师,生成多个运动计划,并考虑 ego 代理对周围环境的影响。TPP 的关键思想是通过构建两个树结构来将连续优化问题转化为可处理离散的 MDP:一个 ego 轨迹树用于 ego 轨迹选择,一个场景树用于多模态 ego 条件环境预测。我们基于真实世界 nuScenes 数据集开展了闭循环模拟,并结果显示,TPP 可以扩展到真实的 AV 场景,并显著优于非政策基准。

URL

https://arxiv.org/abs/2301.11902

PDF

https://arxiv.org/pdf/2301.11902.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot