Paper Reading AI Learner

M3FAS: An Accurate and Robust MultiModal Mobile Face Anti-Spoofing System

2023-01-30 12:37:04
Chenqi Kong, Kexin Zheng, Yibing Liu, Shiqi Wang, Anderson Rocha, Haoliang Li

Abstract

Face presentation attacks (FPA), also known as face spoofing, have brought increasing concerns to the public through various malicious applications, such as financial fraud and privacy leakage. Therefore, safeguarding face recognition systems against FPA is of utmost importance. Although existing learning-based face anti-spoofing (FAS) models can achieve outstanding detection performance, they lack generalization capability and suffer significant performance drops in unforeseen environments. Many methodologies seek to use auxiliary modality data (e.g., depth and infrared maps) during the presentation attack detection (PAD) to address this limitation. However, these methods can be limited since (1) they require specific sensors such as depth and infrared cameras for data capture, which are rarely available on commodity mobile devices, and (2) they cannot work properly in practical scenarios when either modality is missing or of poor quality. In this paper, we devise an accurate and robust MultiModal Mobile Face Anti-Spoofing system named M3FAS to overcome the issues above. The innovation of this work mainly lies in the following aspects: (1) To achieve robust PAD, our system combines visual and auditory modalities using three pervasively available sensors: camera, speaker, and microphone; (2) We design a novel two-branch neural network with three hierarchical feature aggregation modules to perform cross-modal feature fusion; (3). We propose a multi-head training strategy. The model outputs three predictions from the vision, acoustic, and fusion heads, enabling a more flexible PAD. Extensive experiments have demonstrated the accuracy, robustness, and flexibility of M3FAS under various challenging experimental settings.

Abstract (translated)

面部呈现攻击(FPA)也称为面部仿冒,已经通过各种恶意应用引起了公众越来越多的关注,例如金融欺诈和隐私泄漏。因此,保护面部识别系统免受FPA攻击是至关重要的。虽然现有的基于学习的面容抗仿冒(FAS)模型可以表现出卓越的检测性能,但它们缺乏泛化能力,在意想不到的环境中表现出严重的性能下降。许多方法寻求在面部呈现攻击检测(PAD)期间使用辅助视觉和听觉数据(例如深度和红外地图)来解决这一限制。但是,这些方法可能会被限制因为(1)它们需要特定的传感器,例如深度和红外相机,用于数据捕捉,这些传感器在普通移动设备中很少可用,而且(2)当视觉或听觉模式都不可用或质量不佳时,这些方法就无法正常工作。在本文中,我们设计了一种准确且可靠的多模态移动设备面部抗仿冒系统,名为M3FAS,以克服上述问题。这项工作的创新主要存在于以下方面:(1)为了实现可靠的PAD,我们的系统利用三种普遍可用的传感器:相机、扬声器和麦克风,使用三个层级的特征聚合模块进行跨模态特征融合;(2)我们设计了一个独特的两分支神经网络,并具有三个Hierarchical feature aggregation模块,以进行跨模态特征融合;(3)我们提出了一个多目训练策略。模型从视觉、听觉和融合头输出三个预测,从而提供了更灵活的PAD。广泛的实验已经证明了M3FAS在各种挑战性实验设置下的准确、鲁棒性和灵活性。

URL

https://arxiv.org/abs/2301.12831

PDF

https://arxiv.org/pdf/2301.12831.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot