Abstract
Temporal localization of driving actions plays a crucial role in advanced driver-assistance systems and naturalistic driving studies. However, this is a challenging task due to strict requirements for robustness, reliability and accurate localization. In this work, we focus on improving the overall performance by efficiently utilizing video action recognition networks and adapting these to the problem of action localization. To this end, we first develop a density-guided label smoothing technique based on label probability distributions to facilitate better learning from boundary video-segments that typically include multiple labels. Second, we design a post-processing step to efficiently fuse information from video-segments and multiple camera views into scene-level predictions, which facilitates elimination of false positives. Our methodology yields a competitive performance on the A2 test set of the naturalistic driving action recognition track of the 2022 NVIDIA AI City Challenge with an F1 score of 0.271.
Abstract (translated)
时间的驾驶动作的局部化在高级驾驶辅助系统和自然驾驶研究中起着关键作用。然而,由于对鲁棒性、可靠性和准确局部化的严格要求,这是项具有挑战性的任务。在这项工作中,我们专注于通过有效地利用视频动作识别网络来提高整体性能,并将这些网络适应于动作局部定位问题。为此,我们首先开发了一种基于标签概率分布的密度引导标签平滑技术,以促进更好地学习边界视频段中通常包括多个标签的学习。其次,我们设计了一个后处理步骤,以有效地将视频段和多个相机视图中的信息融合到场景级别预测中,从而消除虚假阳性结果。我们采用的方法在2022 NVIDIA AI City Challenge的自然驾驶动作识别赛道上获得了竞争力的性能,F1分数为0.271。
URL
https://arxiv.org/abs/2403.06616