Paper Reading AI Learner

EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations

2025-04-07 18:45:49
Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt

Abstract

As the prediction horizon increases, predicting the future evolution of traffic scenes becomes increasingly difficult due to the multi-modal nature of agent motion. Most state-of-the-art (SotA) prediction models primarily focus on forecasting the most likely future. However, for the safe operation of autonomous vehicles, it is equally important to cover the distribution for plausible motion alternatives. To address this, we introduce EP-Diffuser, a novel parameter-efficient diffusion-based generative model designed to capture the distribution of possible traffic scene evolutions. Conditioned on road layout and agent history, our model acts as a predictor and generates diverse, plausible scene continuations. We benchmark EP-Diffuser against two SotA models in terms of accuracy and plausibility of predictions on the Argoverse 2 dataset. Despite its significantly smaller model size, our approach achieves both highly accurate and plausible traffic scene predictions. We further evaluate model generalization ability in an out-of-distribution (OoD) test setting using Waymo Open dataset and show superior robustness of our approach. The code and model checkpoints can be found here: this https URL.

Abstract (translated)

随着预测时间范围的增加,由于代理运动的多模态特性,对未来交通场景演化的预测难度也在增大。目前最先进的(SotA)预测模型主要集中在预测最有可能的未来情景上。然而,对于自主车辆的安全操作而言,涵盖可能的替代运动方案的概率分布同样重要。为了解决这个问题,我们引入了EP-Diffuser,这是一种新颖的参数高效扩散生成模型,旨在捕捉交通场景演化的各种可能性分布。基于道路布局和代理历史信息,我们的模型作为预测器工作,并产生多样且合理的场景延续。 我们在Argoverse 2数据集上对EP-Diffuser与两个SotA模型在预测准确性和合理性方面进行了基准测试。尽管模型大小显著较小,但我们的方法实现了高度精确且合理的交通场景预测。此外,我们使用Waymo Open数据集在外分布(OoD)测试设置中评估了模型的泛化能力,并展示了我们方法的优越鲁棒性。 代码和模型检查点可以在以下链接找到:[此URL](this https URL)。

URL

https://arxiv.org/abs/2504.05422

PDF

https://arxiv.org/pdf/2504.05422.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot