Paper Reading AI Learner

A Hierarchical Hybrid Learning Framework for Multi-agent Trajectory Prediction

2023-03-22 02:47:42
Yujun Jiao, Mingze Miao, Zhishuai Yin, Chunyuan Lei, Xu Zhu, Linzhen Nie, Bo Tao

Abstract

Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.

Abstract (translated)

相邻代理的准确和鲁棒的轨迹预测对于在复杂场景中自动驾驶车辆非常重要。近年来,大多数方法都是基于深度学习的,因为深度学习在编码复杂交互方面具有优势。然而,由于它们依赖于过去的观察结果,并且无法有效地从稀疏样本中捕捉瞬态和异常交互,所以往往产生不合理的预测。在本文中,我们提出了一种分层的深度学习和强化学习混合框架,用于多代理轨迹预测,以应对由多尺度交互所塑造的轨迹预测挑战。在深度学习阶段,交通场景被分成多个中等规模的异质图形,基于这些图形采用Transformer风格的GNNs来编码异质交互在中等和全球水平上。在强化学习阶段,我们利用深度学习阶段预测的关键未来点将交通场景划分为本地子场景。为了模拟运动规划过程并产生轨迹预测,一个基于Transformer的远程决策优化(PPO)结合车辆运动学模型设计用于在微观交互主导影响下规划运动。一个多目标奖励旨在平衡代理中心准确性和场景间兼容性。实验结果表明,我们的提议与Argoverse预测基准的先进技术相当。可视化结果也表明,分层学习框架捕获了多尺度交互,并提高了预测轨迹的可行性和遵守性。

URL

https://arxiv.org/abs/2303.12274

PDF

https://arxiv.org/pdf/2303.12274.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot