Paper Reading AI Learner

HybriMap: Hybrid Clues Utilization for Effective Vectorized HD Map Construction

2024-04-17 08:08:34
Chi Zhang, Qi Song, Feifei Li, Yongquan Chen, Rui Huang

Abstract

Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years. However, the commonly employed multi-stage sequential workflow in prevailing approaches often leads to the loss of early-stage information, particularly in perspective-view features. Usually, such loss is observed as an instance missing or shape mismatching in the final birds-eye-view predictions. To address this concern, we propose a novel approach, namely \textbf{HybriMap}, which effectively exploits clues from hybrid features to ensure the delivery of valuable information. Specifically, we design the Dual Enhancement Module, to enable both explicit integration and implicit modification under the guidance of hybrid features. Additionally, the perspective keypoints are utilized as supervision, further directing the feature enhancement process. Extensive experiments conducted on existing benchmarks have demonstrated the state-of-the-art performance of our proposed approach.

Abstract (translated)

近年来,从环绕视像相机的构建矢量化高清晰度地图引起了广泛关注。然而,现有的方法中通常采用的多阶段序列工作流程往往会导致在平视 view 特征中丢失早期的信息,尤其是在视角 view 的特征中。通常,这种损失表现为最终鸟瞰预测中实例缺失或形状不匹配。为了应对这种担忧,我们提出了一个新颖的方法,即 \textbf{HybriMap},它有效利用混合特征的线索来确保传递有价值的信息。具体来说,我们设计了一个双增强模块,在混合特征的指导下实现明确的集成和隐含修改。此外,将视角关键点作为监督,进一步指导特征增强过程。对现有基准进行的大量实验证明了我们所提出方法的优越性能。

URL

https://arxiv.org/abs/2404.11155

PDF

https://arxiv.org/pdf/2404.11155.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot