Paper Reading AI Learner

RetinaFace: Single-stage Dense Face Localisation in the Wild

2019-05-02 09:45:23
Jiankang Deng, Jia Guo, Yuxiang Zhou, Jinke Yu, Irene Kotsia, Stefanos Zafeiriou

Abstract

Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by $1.1\%$ (achieving AP equal to {\bf $91.4\%$}). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=$89.59\%$ for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code will be released to facilitate future research.

Abstract (translated)

虽然在不受控制的人脸检测方面取得了巨大进展,但在野外精确和高效的人脸定位仍然是一个公开的挑战。本文提出了一种鲁棒的单级人脸检测算法Retinaface,它利用联合额外监督和自监督多任务学习的优点,实现了不同人脸尺度上的像素化人脸定位。具体来说,我们在以下五个方面做出了贡献:(1)我们在更宽的人脸数据集上手动标注了五个面部标志,并在这一额外的监控信号的帮助下观察到了面部硬检测的显著改善。(2)我们还增加了一个自监督网格解码器分支,用于预测与现有监督分支并行的像素级三维形状面信息。(3)在更宽的面硬测试集上,retinaface比最先进的平均精度(ap)高1.1%$(实现ap等于91.4%$)。(4)在ijb-c测试集上,RetinaFace使最先进的方法(ArcFace)能够改进人脸验证的结果(tar=89.59%$for far=1e-6)。(5)通过使用轻量主干网,RetinFace可以在单个CPU内核上实时运行,以获得VGA分辨率图像。将发布额外的注释和代码,以便于将来的研究。

URL

https://arxiv.org/abs/1905.00641

PDF

https://arxiv.org/pdf/1905.00641.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot