Paper Reading AI Learner

LiteFat: Lightweight Spatio-Temporal Graph Learning for Real-Time Driver Fatigue Detection

2025-07-29 12:37:53
Jing Ren, Suyu Ma, Hong Jia, Xiwei Xu, Ivan Lee, Haytham Fayek, Xiaodong Li, Feng Xia

Abstract

Detecting driver fatigue is critical for road safety, as drowsy driving remains a leading cause of traffic accidents. Many existing solutions rely on computationally demanding deep learning models, which result in high latency and are unsuitable for embedded robotic devices with limited resources (such as intelligent vehicles/cars) where rapid detection is necessary to prevent accidents. This paper introduces LiteFat, a lightweight spatio-temporal graph learning model designed to detect driver fatigue efficiently while maintaining high accuracy and low computational demands. LiteFat involves converting streaming video data into spatio-temporal graphs (STG) using facial landmark detection, which focuses on key motion patterns and reduces unnecessary data processing. LiteFat uses MobileNet to extract facial features and create a feature matrix for the STG. A lightweight spatio-temporal graph neural network is then employed to identify signs of fatigue with minimal processing and low latency. Experimental results on benchmark datasets show that LiteFat performs competitively while significantly decreasing computational complexity and latency as compared to current state-of-the-art methods. This work enables the development of real-time, resource-efficient human fatigue detection systems that can be implemented upon embedded robotic devices.

Abstract (translated)

检测驾驶员疲劳对于道路安全至关重要,因为困倦驾驶仍然是交通事故的主要原因之一。许多现有的解决方案依赖于计算需求高的深度学习模型,这导致了高延迟,并且不适合资源有限的嵌入式机器人设备(如智能车辆/汽车),这些设备需要快速检测以防止事故的发生。本文介绍了LiteFat,这是一种轻量级的空间-时间图学习模型,旨在高效地检测驾驶员疲劳,同时保持高准确性和低计算需求。 LiteFat通过使用面部关键点检测将实时视频数据转换为空间-时间图(STG),专注于关键的运动模式并减少不必要的数据处理。LiteFat利用MobileNet提取面部特征,并为STG创建特征矩阵。然后采用轻量级的空间-时间图神经网络,以最少的处理和低延迟识别疲劳迹象。 在基准数据集上的实验结果表明,与当前最先进的方法相比,LiteFat能够保持竞争力的同时显著降低计算复杂性和延迟。这项工作使开发实时、资源高效的驾驶员疲劳检测系统成为可能,并且可以部署在嵌入式机器人设备上。

URL

https://arxiv.org/abs/2507.21756

PDF

https://arxiv.org/pdf/2507.21756.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot