Paper Reading AI Learner

Precise Facial Landmark Detection by Reference Heatmap Transformer

2023-03-14 12:26:48
Jun Wan, Jun Liu, Jie Zhou, Zhihui Lai, Linlin Shen, Hang Sun, Ping Xiong, Wenwen Min

Abstract

Most facial landmark detection methods predict landmarks by mapping the input facial appearance features to landmark heatmaps and have achieved promising results. However, when the face image is suffering from large poses, heavy occlusions and complicated illuminations, they cannot learn discriminative feature representations and effective facial shape constraints, nor can they accurately predict the value of each element in the landmark heatmap, limiting their detection accuracy. To address this problem, we propose a novel Reference Heatmap Transformer (RHT) by introducing reference heatmap information for more precise facial landmark detection. The proposed RHT consists of a Soft Transformation Module (STM) and a Hard Transformation Module (HTM), which can cooperate with each other to encourage the accurate transformation of the reference heatmap information and facial shape constraints. Then, a Multi-Scale Feature Fusion Module (MSFFM) is proposed to fuse the transformed heatmap features and the semantic features learned from the original face images to enhance feature representations for producing more accurate target heatmaps. To the best of our knowledge, this is the first study to explore how to enhance facial landmark detection by transforming the reference heatmap information. The experimental results from challenging benchmark datasets demonstrate that our proposed method outperforms the state-of-the-art methods in the literature.

Abstract (translated)

大多数面部地标检测方法通过将输入的面部外观特征映射到地标热图来预测地标,并取得了良好的结果。然而,当面部图像受到大型姿态、严重遮挡和复杂的照明条件时,它们无法学习鲜明的特征表示和有效的面部形状限制,也无法准确地预测地标热图每个元素的值,从而限制了它们的检测精度。为了解决这一问题,我们提出了一种新的参考热图Transformer(RHT),通过引入参考热图信息来提高更精确的面部地标检测。 proposed RHT由一个软转换模块(STM)和一个硬转换模块(HTM)组成,可以互相合作,鼓励准确转换参考热图信息和面部形状限制。然后,我们提出了一个多尺度特征融合模块(MSFFM),将转换后热图特征和从原始面部图像中学习到的语义特征进行融合,以增强特征表示,以产生更准确的目标热图。据我们所知,这是第一个研究探索通过转换参考热图信息来提高面部地标检测的方法。挑战性基准数据集的实验结果表明,我们提出的方法在文献中比最先进的方法表现更好。

URL

https://arxiv.org/abs/2303.07840

PDF

https://arxiv.org/pdf/2303.07840.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot