Paper Reading AI Learner

Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting

2025-01-08 20:11:09
Kaouther Messaoud, Matthieu Cord, Alexandre Alahi

Abstract

Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. It is often due to limitations like complex architectures customized for a specific dataset and inefficient multimodal handling. We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details. Additionally, our approach of reconstructing segmentlevel trajectories and lane segments from masked inputs with query drop, enables effective use of contextual information and improves generalization; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation. PerReg+ sets a new state-of-the-art performance on nuScenes [1], Argoverse 2 [2], and Waymo Open Motion Dataset (WOMD) [3]. Remarkable, our pretrained model reduces the error by 6.8% on smaller datasets, and multi-dataset training enhances generalization. In cross-domain tests, PerReg+ reduces B-FDE by 11.8% compared to its non-pretrained variant.

Abstract (translated)

现有的车辆轨迹预测模型在泛化能力、预测不确定性以及处理复杂交互方面存在挑战,这通常是因为复杂的架构针对特定数据集进行了定制,并且多模态处理效率低下。我们提出了一个新颖的轨迹预测框架Perceiver with Register queries (简称 PerReg+),该框架引入了以下几点改进: 1. 通过自蒸馏(Self-Distillation, SD)和掩码重建(Masked Reconstruction, MR),实现双层表示学习,能够捕捉全局上下文信息与细粒度细节。此外,我们通过从被屏蔽的输入中进行段级轨迹以及车道段的重构,并采用查询删除策略,有效利用了上下文信息并提升了泛化能力; 2. 采用基于注册查询和预训练的方法增强多模态处理能力,消除了聚类和抑制的需求; 3. 在微调过程中实现自适应提示调整(Adaptive Prompt Tuning),通过冻结主要架构,并优化少量提示来实现高效的适应性。 PerReg+在nuScenes、Argoverse 2 和Waymo Open Motion Dataset (WOMD) 数据集上达到了新的性能上限。值得注意的是,我们的预训练模型在较小数据集上的误差降低了6.8%,而跨数据集的多任务训练进一步提升了泛化能力。在跨域测试中,PerReg+相较于非预训练版本将B-FDE(最终距离误差)减少了11.8%。 通过这些改进,PerReg+不仅提高了预测精度和效率,还增强了模型在不同场景下的适应性和鲁棒性。

URL

https://arxiv.org/abs/2501.04815

PDF

https://arxiv.org/pdf/2501.04815.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot