Abstract
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions. It is often due to limitations like complex architectures customized for a specific dataset and inefficient multimodal handling. We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details. Additionally, our approach of reconstructing segmentlevel trajectories and lane segments from masked inputs with query drop, enables effective use of contextual information and improves generalization; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation. PerReg+ sets a new state-of-the-art performance on nuScenes [1], Argoverse 2 [2], and Waymo Open Motion Dataset (WOMD) [3]. Remarkable, our pretrained model reduces the error by 6.8% on smaller datasets, and multi-dataset training enhances generalization. In cross-domain tests, PerReg+ reduces B-FDE by 11.8% compared to its non-pretrained variant.
Abstract (translated)
现有的车辆轨迹预测模型在泛化能力、预测不确定性以及处理复杂交互方面存在挑战,这通常是因为复杂的架构针对特定数据集进行了定制,并且多模态处理效率低下。我们提出了一个新颖的轨迹预测框架Perceiver with Register queries (简称 PerReg+),该框架引入了以下几点改进: 1. 通过自蒸馏(Self-Distillation, SD)和掩码重建(Masked Reconstruction, MR),实现双层表示学习,能够捕捉全局上下文信息与细粒度细节。此外,我们通过从被屏蔽的输入中进行段级轨迹以及车道段的重构,并采用查询删除策略,有效利用了上下文信息并提升了泛化能力; 2. 采用基于注册查询和预训练的方法增强多模态处理能力,消除了聚类和抑制的需求; 3. 在微调过程中实现自适应提示调整(Adaptive Prompt Tuning),通过冻结主要架构,并优化少量提示来实现高效的适应性。 PerReg+在nuScenes、Argoverse 2 和Waymo Open Motion Dataset (WOMD) 数据集上达到了新的性能上限。值得注意的是,我们的预训练模型在较小数据集上的误差降低了6.8%,而跨数据集的多任务训练进一步提升了泛化能力。在跨域测试中,PerReg+相较于非预训练版本将B-FDE(最终距离误差)减少了11.8%。 通过这些改进,PerReg+不仅提高了预测精度和效率,还增强了模型在不同场景下的适应性和鲁棒性。
URL
https://arxiv.org/abs/2501.04815