Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.

Abstract (translated)

预测动态交通角色的未来轨迹是自动驾驶中的一个重要任务。尽管已经取得了很多显著的性能改进,但场景认知和理解复杂交通语义之间仍然存在差距。本文提出了Traj-LLM,第一个研究使用明确提示工程的大型语言模型(LLM)从代理商过去/观察到的轨迹和场景语义中生成未来运动的尝试。Traj-LLM从稀疏上下文联合编码开始分解代理商和场景特征为LLM可以理解的形式。在此基础上,我们创新地探讨了LLM的强大的理解能力,以捕捉高级场景知识和交互信息。通过模拟人类车道关注认知功能和增强Traj-LLM的场景理解,我们引入了由Mamba模块引导的具有雷达域注意的概率学习。最后,设计了一个多模态Laplace解码器,以实现场景兼容的多模态预测。大量实验证明,Traj-LLM在LLM的强烈先验知识和理解能力的支持下,与具有雷达域注意的概率学习相结合,超越了最先进的评估指标。此外,少数样本分析进一步证实了Traj-LLM的性能,其中只需使用50%的数据集,它就超越了大多数基于完整数据利用的基准。本研究探讨了将LLM的高级特性应用于轨迹预测任务,提供了一种更通用和可扩展的预测代理商运动的新方法。

URL

https://arxiv.org/abs/2405.04909

PDF

https://arxiv.org/pdf/2405.04909.pdf

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Abstract

Abstract (translated)

URL

PDF Copy

PDF