Abstract
Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by $1.1\%$ (achieving AP equal to {\bf $91.4\%$}). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=$89.59\%$ for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code will be released to facilitate future research.
Abstract (translated)
虽然在不受控制的人脸检测方面取得了巨大进展,但在野外精确和高效的人脸定位仍然是一个公开的挑战。本文提出了一种鲁棒的单级人脸检测算法Retinaface,它利用联合额外监督和自监督多任务学习的优点,实现了不同人脸尺度上的像素化人脸定位。具体来说,我们在以下五个方面做出了贡献:(1)我们在更宽的人脸数据集上手动标注了五个面部标志,并在这一额外的监控信号的帮助下观察到了面部硬检测的显著改善。(2)我们还增加了一个自监督网格解码器分支,用于预测与现有监督分支并行的像素级三维形状面信息。(3)在更宽的面硬测试集上,retinaface比最先进的平均精度(ap)高1.1%$(实现ap等于91.4%$)。(4)在ijb-c测试集上,RetinaFace使最先进的方法(ArcFace)能够改进人脸验证的结果(tar=89.59%$for far=1e-6)。(5)通过使用轻量主干网,RetinFace可以在单个CPU内核上实时运行,以获得VGA分辨率图像。将发布额外的注释和代码,以便于将来的研究。
URL
https://arxiv.org/abs/1905.00641