Paper Reading AI Learner

Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

2024-05-08 09:28:04
Zhengxing Lan, Hongbo Li, Lingshan Liu, Bo Fan, Yisheng Lv, Yilong Ren, Zhiyong Cui

Abstract

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.

Abstract (translated)

预测动态交通角色的未来轨迹是自动驾驶中的一个重要任务。尽管已经取得了很多显著的性能改进,但场景认知和理解复杂交通语义之间仍然存在差距。本文提出了Traj-LLM,第一个研究使用明确提示工程的大型语言模型(LLM)从代理商过去/观察到的轨迹和场景语义中生成未来运动的尝试。Traj-LLM从稀疏上下文联合编码开始分解代理商和场景特征为LLM可以理解的形式。在此基础上,我们创新地探讨了LLM的强大的理解能力,以捕捉高级场景知识和交互信息。通过模拟人类车道关注认知功能和增强Traj-LLM的场景理解,我们引入了由Mamba模块引导的具有雷达域注意的概率学习。最后,设计了一个多模态Laplace解码器,以实现场景兼容的多模态预测。大量实验证明,Traj-LLM在LLM的强烈先验知识和理解能力的支持下,与具有雷达域注意的概率学习相结合,超越了最先进的评估指标。此外,少数样本分析进一步证实了Traj-LLM的性能,其中只需使用50%的数据集,它就超越了大多数基于完整数据利用的基准。本研究探讨了将LLM的高级特性应用于轨迹预测任务,提供了一种更通用和可扩展的预测代理商运动的新方法。

URL

https://arxiv.org/abs/2405.04909

PDF

https://arxiv.org/pdf/2405.04909.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot