Paper Reading AI Learner

PLUTO: Pushing the Limit of Imitation Learning-based Planning for Autonomous Driving

2024-04-22 16:38:41
Jie Cheng, Yingbing Chen, Qifeng Chen

Abstract

We present PLUTO, a powerful framework that pushes the limit of imitation learning-based planning for autonomous driving. Our improvements stem from three pivotal aspects: a longitudinal-lateral aware model architecture that enables flexible and diverse driving behaviors; An innovative auxiliary loss computation method that is broadly applicable and efficient for batch-wise calculation; A novel training framework that leverages contrastive learning, augmented by a suite of new data augmentations to regulate driving behaviors and facilitate the understanding of underlying interactions. We assessed our framework using the large-scale real-world nuPlan dataset and its associated standardized planning benchmark. Impressively, PLUTO achieves state-of-the-art closed-loop performance, beating other competing learning-based methods and surpassing the current top-performed rule-based planner for the first time. Results and code are available at this https URL.

Abstract (translated)

我们提出了PLUTO,一个强大的框架,可以将自动驾驶中基于模仿学习的规划极限推向更高。我们的改进源于三个关键方面:一个纵向-横向感知模型架构,实现灵活多样且和谐的驾驶行为;一种适用于批量计算的创新辅助损失计算方法;一种利用对比学习的新颖训练框架,通过一系列新的数据增强方法调节驾驶行为,并促进底层交互的理解。我们对PLUTO框架进行了评估,使用了大规模现实世界nuPlan数据集及其相关的标准化规划基准。令人印象深刻的是,PLUTO实现了最先进的闭环性能,超越了其他竞争性的基于学习的方法和当前最高表现的基于规则的规划器,这是第一次实现的。结果和代码可在此链接中查看:https://url.org/

URL

https://arxiv.org/abs/2404.14327

PDF

https://arxiv.org/pdf/2404.14327.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot