Paper Reading AI Learner

Temporal Multimodal Fusion for Driver Behavior Prediction Tasks using Gated Recurrent Fusion Units

2019-10-01 19:34:09
Athma Narayanan, Avinash Siravuru, Behzad Dariush

Abstract

The Tactical Driver Behavior modeling problem requires understanding of driver actions in complicated urban scenarios from a rich multi modal signals including video, LiDAR and CAN bus data streams. However, the majority of deep learning research is focused either on learning the vehicle/environment state (sensor fusion) or the driver policy (from temporal data), but not both. Learning both tasks end-to-end offers the richest distillation of knowledge, but presents challenges in formulation and successful training. In this work, we propose promising first steps in this direction. Inspired by the gating mechanisms in LSTM, we propose gated recurrent fusion units (GRFU) that learn fusion weighting and temporal weighting simultaneously. We demonstrate it's superior performance over multimodal and temporal baselines in supervised regression and classification tasks, all in the realm of autonomous navigation. We note a 10% improvement in the mAP score over state-of-the-art for tactical driver behavior classification in HDD dataset and a 20% drop in overall Mean squared error for steering action regression on TORCS dataset.

Abstract (translated)

URL

https://arxiv.org/abs/1910.00628

PDF

https://arxiv.org/pdf/1910.00628


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Attention Autonomous Bert Boundary_Detection Caption Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Drone Dynamic_Memory_Network Edge_Detection Embedding Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot