Paper Reading AI Learner

DeCaFA: Deep Convolutional Cascade for Face Alignment In The Wild

2019-04-04 13:36:11
Arnaud Dapogny, Kévin Bailly, Matthieu Cord

Abstract

Face Alignment is an active computer vision domain, that consists in localizing a number of facial landmarks that vary across datasets. State-of-the-art face alignment methods either consist in end-to-end regression, or in refining the shape in a cascaded manner, starting from an initial guess. In this paper, we introduce DeCaFA, an end-to-end deep convolutional cascade architecture for face alignment. DeCaFA uses fully-convolutional stages to keep full spatial resolution throughout the cascade. Between each cascade stage, DeCaFA uses multiple chained transfer layers with spatial softmax to produce landmark-wise attention maps for each of several landmark alignment tasks. Weighted intermediate supervision, as well as efficient feature fusion between the stages allow to learn to progressively refine the attention maps in an end-to-end manner. We show experimentally that DeCaFA significantly outperforms existing approaches on 300W, CelebA and WFLW databases. In addition, we show that DeCaFA can learn fine alignment with reasonable accuracy from very few images using coarsely annotated data.

Abstract (translated)

人脸对齐是一个活跃的计算机视觉领域,它包括定位不同数据集的许多面部标志。最先进的面部对齐方法要么包括端到端回归,要么从最初的猜测开始以级联的方式细化形状。在本文中,我们介绍了一种端到端的深度卷积级联结构decafa。decafa使用完全卷积阶段来在整个级联过程中保持完全的空间分辨率。在每个级联阶段之间,Decafa使用多个具有空间SoftMax的链传输层为几个里程碑对齐任务中的每一个生成具有里程碑意义的注意地图。加权中间监督以及阶段之间的有效特征融合允许学习以端到端的方式逐步细化注意力图。实验结果表明,在300W、Celeba和WFLW数据库中,decafa显著优于现有的方法。此外,我们还表明,使用粗注释数据,decafa可以从很少的图像中以合理的精度学习精细对齐。

URL

https://arxiv.org/abs/1904.02549

PDF

https://arxiv.org/pdf/1904.02549.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot