Paper Reading AI Learner

Density-Guided Label Smoothing for Temporal Localization of Driving Actions

2024-03-11 11:06:41
Tunc Alkanat, Erkut Akdag, Egor Bondarev, Peter H. N. De With

Abstract

Temporal localization of driving actions plays a crucial role in advanced driver-assistance systems and naturalistic driving studies. However, this is a challenging task due to strict requirements for robustness, reliability and accurate localization. In this work, we focus on improving the overall performance by efficiently utilizing video action recognition networks and adapting these to the problem of action localization. To this end, we first develop a density-guided label smoothing technique based on label probability distributions to facilitate better learning from boundary video-segments that typically include multiple labels. Second, we design a post-processing step to efficiently fuse information from video-segments and multiple camera views into scene-level predictions, which facilitates elimination of false positives. Our methodology yields a competitive performance on the A2 test set of the naturalistic driving action recognition track of the 2022 NVIDIA AI City Challenge with an F1 score of 0.271.

Abstract (translated)

时间的驾驶动作的局部化在高级驾驶辅助系统和自然驾驶研究中起着关键作用。然而,由于对鲁棒性、可靠性和准确局部化的严格要求,这是项具有挑战性的任务。在这项工作中,我们专注于通过有效地利用视频动作识别网络来提高整体性能,并将这些网络适应于动作局部定位问题。为此,我们首先开发了一种基于标签概率分布的密度引导标签平滑技术,以促进更好地学习边界视频段中通常包括多个标签的学习。其次,我们设计了一个后处理步骤,以有效地将视频段和多个相机视图中的信息融合到场景级别预测中,从而消除虚假阳性结果。我们采用的方法在2022 NVIDIA AI City Challenge的自然驾驶动作识别赛道上获得了竞争力的性能,F1分数为0.271。

URL

https://arxiv.org/abs/2403.06616

PDF

https://arxiv.org/pdf/2403.06616.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot