Paper Reading AI Learner

ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

2025-12-13 06:33:02
Jeongjun Park, Sunwook Hwang, Hyeonho Noh, Jin Mo Yang, Hyun Jong Yang, Saewoong Bahk

Abstract

Distracted driving contributes to fatal crashes worldwide. To address this, researchers are using driver activity recognition (DAR) with impulse radio ultra-wideband (IR-UWB) radar, which offers advantages such as interference resistance, low power consumption, and privacy preservation. However, two challenges limit its adoption: the lack of large-scale real-world UWB datasets covering diverse distracted driving behaviors, and the difficulty of adapting fixed-input Vision Transformers (ViTs) to UWB radar data with non-standard dimensions. This work addresses both challenges. We present the ALERT dataset, which contains 10,220 radar samples of seven distracted driving activities collected in real driving conditions. We also propose the input-size-agnostic Vision Transformer (ISA-ViT), a framework designed for radar-based DAR. The proposed method resizes UWB data to meet ViT input requirements while preserving radar-specific information such as Doppler shifts and phase characteristics. By adjusting patch configurations and leveraging pre-trained positional embedding vectors (PEVs), ISA-ViT overcomes the limitations of naive resizing approaches. In addition, a domain fusion strategy combines range- and frequency-domain features to further improve classification performance. Comprehensive experiments demonstrate that ISA-ViT achieves a 22.68% accuracy improvement over an existing ViT-based approach for UWB-based DAR. By publicly releasing the ALERT dataset and detailing our input-size-agnostic strategy, this work facilitates the development of more robust and scalable distracted driving detection systems for real-world deployment.

Abstract (translated)

驾驶分心是全球致命车祸的主要原因之一。为解决这一问题,研究人员正在使用驾驶员活动识别(DAR)技术结合脉冲无线电超宽带(IR-UWB)雷达,这种技术具有抗干扰能力强、能耗低和保护隐私等优点。然而,两个挑战限制了其广泛应用:一是缺乏涵盖各种分心驾驶行为的大型真实世界UWB数据集;二是难以将固定输入的视觉变换器(ViT)适应于非标准尺寸的UWB雷达数据。 本研究解决了上述两个问题。我们提供了ALERT数据集,其中包含在实际驾驶条件下收集的7种分心驾驶活动的10,220个雷达样本。此外,还提出了一种适用于基于雷达DAR的输入大小无关视觉变换器(ISA-ViT)框架。所提出的这种方法通过调整UWB数据尺寸以满足ViT输入要求的同时保留了如多普勒频移和相位特征等特定于雷达的信息。通过对补丁配置进行调整并利用预训练的位置嵌入向量(PEV),ISA-ViT克服了简单缩放方法的局限性。此外,一种领域融合策略结合范围域和频率域特征进一步提高了分类性能。 全面实验表明,与现有的基于ViT的方法相比,ISA-ViT在基于UWB的DAR上实现了22.68%的准确率提升。通过公开发布ALERT数据集并详细说明我们的输入大小无关策略,本工作促进了更加稳健和可扩展的分心驾驶检测系统的实际应用开发。

URL

https://arxiv.org/abs/2512.12206

PDF

https://arxiv.org/pdf/2512.12206.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot