Paper Reading AI Learner

TrajPRed: Trajectory Prediction with Region-based Relation Learning

2024-04-10 12:31:43
Chen Zhou, Ghassan AlRegib, Armin Parchami, Kunjan Singh

Abstract

Forecasting human trajectories in traffic scenes is critical for safety within mixed or fully autonomous systems. Human future trajectories are driven by two major stimuli, social interactions, and stochastic goals. Thus, reliable forecasting needs to capture these two stimuli. Edge-based relation modeling represents social interactions using pairwise correlations from precise individual states. Nevertheless, edge-based relations can be vulnerable under perturbations. To alleviate these issues, we propose a region-based relation learning paradigm that models social interactions via region-wise dynamics of joint states, i.e., the changes in the density of crowds. In particular, region-wise agent joint information is encoded within convolutional feature grids. Social relations are modeled by relating the temporal changes of local joint information from a global perspective. We show that region-based relations are less susceptible to perturbations. In order to account for the stochastic individual goals, we exploit a conditional variational autoencoder to realize multi-goal estimation and diverse future prediction. Specifically, we perform variational inference via the latent distribution, which is conditioned on the correlation between input states and associated target goals. Sampling from the latent distribution enables the framework to reliably capture the stochastic behavior in test data. We integrate multi-goal estimation and region-based relation learning to model the two stimuli, social interactions, and stochastic goals, in a prediction framework. We evaluate our framework on the ETH-UCY dataset and Stanford Drone Dataset (SDD). We show that the diverse prediction better fits the ground truth when incorporating the relation module. Our framework outperforms the state-of-the-art models on SDD by $27.61\%$/$18.20\%$ of ADE/FDE metrics.

Abstract (translated)

预测交通场景中的人轨迹对于混合或完全自动驾驶系统中的安全性至关重要。人的未来轨迹由社交互动和随机目标两个主要刺激驱动。因此,可靠的预测需要捕捉这两个刺激。基于边缘的关系建模使用精确个体状态的成对相关来表示社交互动。然而,边缘关系在扰动下可能变得脆弱。为了减轻这些问题,我们提出了一个基于区域的关联学习范式,通过联合状态的局部动态来建模社交互动,即人流的密度变化。特别地,区域间的代理器联合信息编码在卷积特征网格中。社交关系通过从全局角度描述局部联合信息的时间变化来建模。我们证明了基于区域的关系对扰动具有较强的鲁棒性。为了考虑随机个人目标,我们利用条件随机变分自编码器实现多目标估计和多样未来预测。具体来说,我们通过条件分布进行元规划推理,该分布与输入状态和相关的目标之间的相关性条件。从条件分布中采样使得预测框架能够可靠地捕捉测试数据的随机行为。我们将多目标估计和基于区域的关系学习相结合,建模两个刺激,社交互动和随机目标,在预测框架中。我们在ETH-UCY数据集和斯坦福无人机数据集(SDD)上评估我们的框架。我们发现,引入关系模块后,多样预测更贴近地面真实值。我们的框架在SDD上的性能比最先进的模型提高了$27.61\%$/$18.20\%$的ADE/FDE指标。

URL

https://arxiv.org/abs/2404.06971

PDF

https://arxiv.org/pdf/2404.06971.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot