Paper Reading AI Learner

Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

2023-03-28 15:35:25
Wenzheng Zeng, Yang Xiao, Sicheng Wei, Jinfang Gan, Xintao Zhang, Zhiguo Cao, Zhiwen Fang, Joey Tianyi Zhou

Abstract

Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.

Abstract (translated)

在野外实时监测 eyeblink 可以广泛用于疲劳检测、面部防伪造、情感分析等。现有的研究努力一般集中在剪辑视频的单人案例上。然而,在未剪辑的视频内多人场景也对实际应用至关重要,这一点尚未得到足够关注。为了解决这个问题,我们首次在数据集、理论和实践中做出了重要贡献,特别是提出了一个名为 MPEblink 的大型数据集,该数据集涉及 686 个未剪辑视频和 8748 个 eyeblink 事件,从不受限制的电影中采集样本,以揭示“在野外”的特征。同时,我们也提出了一种实时多人 eyeblink 检测方法。与现有的对应方法不同,我们的提议采用了一个单一的阶段空间方式,并具有端到端学习能力。具体来说,它同时解决了人脸检测、人脸跟踪和人实例级 eyeblink 检测的任务。这个范式有两个主要优势:(1) eyeblink 特征可以通过人脸的全球上下文(例如,头部姿势和照明条件)进行优化和交互,以促进;(2) 解决这些任务并行而不是Sequentially 的方式可以节省大量时间,以满足实时运行需求。MPEblink 数据集的实验验证了在野外实时监测未剪辑视频的多人 eyeblink 检测的关键挑战。我们的方法还以显著优势超越了现有的方法,并具有快速推理速度。

URL

https://arxiv.org/abs/2303.16053

PDF

https://arxiv.org/pdf/2303.16053.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot