Paper Reading AI Learner

Joint Action Unit localisation and intensity estimation through heatmap regression

2018-07-20 14:19:16
Enrique Sanchez-Lozano, Georgios Tzimiropoulos, Michel Valstar

Abstract

This paper proposes a supervised learning approach to jointly perform facial Action Unit (AU) localisation and intensity estimation. Contrary to previous works that try to learn an unsupervised representation of the Action Unit regions, we propose to directly and jointly estimate all AU intensities through heatmap regression, along with the location in the face where they cause visible changes. Our approach aims to learn a pixel-wise regression function returning a score per AU, which indicates an AU intensity at a given spatial location. Heatmap regression then generates an image, or channel, per AU, in which each pixel indicates the corresponding AU intensity. To generate the ground-truth heatmaps for a target AU, the facial landmarks are first estimated, and a 2D Gaussian is drawn around the points where the AU is known to cause changes. The amplitude and size of the Gaussian is determined by the intensity of the AU. We show that using a single Hourglass network suffices to attain new state of the art results, demonstrating the effectiveness of such a simple approach. The use of heatmap regression allows learning of a shared representation between AUs without the need to rely on latent representations, as these are implicitly learned from the data. We validate the proposed approach on the BP4D dataset, showing a modest improvement on recent, complex, techniques, as well as robustness against misalignment errors. Code for testing and models will be available to download from https://github.com/ESanchezLozano/Action-Units-Heatmaps.

Abstract (translated)

本文提出了一种监督学习方法,共同执行面部动作单元(AU)定位和强度估计。与先前尝试学习行动单元区域的无监督表示的作品相反,我们建议通过热图回归直接和联合估计所有AU强度,以及它们引起可见变化的面部位置。我们的方法旨在学习像素逐渐回归函数,返回每AU的分数,其表示在给定空间位置处的AU强度。然后,热图回归生成每个AU的图像或通道,其中每个像素指示相应的AU强度。为了生成目标AU的地面真实热图,首先估计面部地标,并且在已知AU引起变化的点周围绘制2D高斯。高斯的幅度和大小由AU的强度确定。我们表明,使用单个沙漏网络就足以获得最新的技术成果,展示了这种简单方法的有效性。热图回归的使用允许学习AU之间的共享表示,而不需要依赖潜在表示,因为这些是从数据中隐含地学习的。我们在BP4D数据集上验证了所提出的方法,显示了对近期,复杂技术的适度改进,以及对未对准错误的鲁棒性。可从https://github.com/ESanchezLozano/Action-Units-Heatmaps下载测试和模型代码。

URL

https://arxiv.org/abs/1805.03487

PDF

https://arxiv.org/pdf/1805.03487.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot