Paper Reading AI Learner

VAD: Vectorized Scene Representation for Efficient Autonomous Driving

2023-03-21 17:59:22
Bo Jiang, Shaoyu Chen, Qing Xu, Bencheng Liao, Jiajie Chen, Helong Zhou, Qian Zhang, Wenyu Liu, Chang Huang, Xinggang Wang

Abstract

Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive and misses the instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as fully vectorized representation. The proposed vectorized paradigm has two significant advantages. On one hand, VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints which effectively improves planning safety. On the other hand, VAD runs much faster than previous end-to-end planning methods by getting rid of computation-intensive rasterized representation and hand-designed post-processing steps. VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, outperforming the previous best method by a large margin (reducing the average collision rate by 48.4%). Besides, VAD greatly improves the inference speed (up to 9.3x), which is critical for the real-world deployment of an autonomous driving system. Code and models will be released for facilitating future research.

Abstract (translated)

无人驾驶需要对周围环境进行全面的理解,以进行可靠的轨迹规划。以前的研究依赖于密集栅格化的场景表示(例如,Agent利用率和语义地图)进行规划,计算量巨大,缺少实例级结构信息。在本文中,我们提出了VAD,一种无人驾驶领域的端到端 vectorized 范式,将驾驶场景建模为完全 vectorized 表示。该 vectorized 范式具有两个显著的优势。一方面,VAD 利用 vectorized 的agent 运动和地图元素作为明确的实例级规划约束,有效地提高了规划安全性。另一方面,VAD 运行速度比以前的端到端规划方法快得多,通过消除计算量巨大的栅格化表示和手动设计的后处理步骤。在 nuiScenes 数据集上,VAD 实现了最先进的端到端规划性能,比以前的最好方法(降低平均碰撞率48.4%)表现更好(减少了平均碰撞率48.4%)。此外,VAD 大大提高了推理速度(高达 9.3x),这对于真实世界中部署无人驾驶系统至关重要。代码和模型将用于方便未来的研究。

URL

https://arxiv.org/abs/2303.12077

PDF

https://arxiv.org/pdf/2303.12077.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot