Paper Reading AI Learner

Adaptive Wing Loss for Robust Face Alignment via Heatmap Regression

2019-04-16 02:00:24
Xinyao Wang, Liefeng Bo, Li Fuxin

Abstract

Heatmap regression has became one of the mainstream approaches to localize facial landmarks. As Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) are becoming popular in solving computer vision tasks, extensive research has been done on these architectures. However, the loss function for heatmap regression is rarely studied. In this paper, we analyze the ideal loss function properties for heatmap regression in face alignment problems. Then we propose a novel loss function, named Adaptive Wing loss, that is able to adapt its shape to different types of ground truth heatmap pixels. This adaptability decreases the loss to zero on foreground pixels while leaving some loss on background pixels. To address the imbalance between foreground and background pixels, we also propose Weighted Loss Map, which assigns high weights on foreground and difficult background pixels to help training process focus more on pixels that are crucial to landmark localization. To further improve face alignment accuracy, we introduce boundary prediction and CoordConv with boundary coordinates. Extensive experiments on different benchmarks, including COFW, 300W and WFLW, show our approach outperforms the state-of-the-art by a significant margin on various evaluation metrics. Besides, the Adaptive Wing loss also helps other heatmap regression tasks. Code will be made publicly available.

Abstract (translated)

热图回归已成为面部标志物定位的主流方法之一。随着卷积神经网络(CNN)和递归神经网络(RNN)在解决计算机视觉任务中的应用越来越广泛,对这些体系结构进行了广泛的研究。然而,热回归的损失函数却很少被研究。本文分析了面对准问题中热映射回归的理想损失函数性质。然后,我们提出了一种新的损失函数,称为自适应机翼损失,它能够使其形状适应不同类型的地面真热图像素。这种适应性将前景像素的损失降低到零,同时在背景像素上留下一些损失。为了解决前景像素和背景像素之间的不平衡问题,我们还提出了加权损失图,它在前景像素上赋予高权重,在背景像素上赋予困难像素,以帮助训练过程将更多的注意力集中在对地标定位至关重要的像素上。为了进一步提高人脸的对准精度,引入了边界预测和边界坐标坐标坐标坐标COORDCONV。对不同基准(包括COFW、300W和WFLW)进行的大量实验表明,我们的方法在各种评估指标上的显著优势超过了最先进的方法。此外,自适应机翼损失也有助于其他热图回归任务。代码将公开发布。

URL

https://arxiv.org/abs/1904.07399

PDF

https://arxiv.org/pdf/1904.07399.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot