Paper Reading AI Learner

Faceptor: A Generalist Model for Face Perception

2024-03-14 15:42:31
Lixiong Qin, Mei Wang, Xuannan Liu, Yuhang Zhang, Wei Deng, Xiaoshuai Song, Weiran Xu, Weihong Deng

Abstract

With the comprehensive research conducted on various face analysis tasks, there is a growing interest among researchers to develop a unified approach to face perception. Existing methods mainly discuss unified representation and training, which lack task extensibility and application efficiency. To tackle this issue, we focus on the unified model structure, exploring a face generalist model. As an intuitive design, Naive Faceptor enables tasks with the same output shape and granularity to share the structural design of the standardized output head, achieving improved task extensibility. Furthermore, Faceptor is proposed to adopt a well-designed single-encoder dual-decoder architecture, allowing task-specific queries to represent new-coming semantics. This design enhances the unification of model structure while improving application efficiency in terms of storage overhead. Additionally, we introduce Layer-Attention into Faceptor, enabling the model to adaptively select features from optimal layers to perform the desired tasks. Through joint training on 13 face perception datasets, Faceptor achieves exceptional performance in facial landmark localization, face parsing, age estimation, expression recognition, binary attribute classification, and face recognition, achieving or surpassing specialized methods in most tasks. Our training framework can also be applied to auxiliary supervised learning, significantly improving performance in data-sparse tasks such as age estimation and expression recognition. The code and models will be made publicly available at this https URL.

Abstract (translated)

在对各种面部分析任务进行全面的调查和研究后,越来越多的研究者对发展统一的面部感知方法产生了浓厚的兴趣。现有的方法主要讨论了统一的表示和训练,缺乏任务的扩展性和应用效率。为解决这个问题,我们关注统一的模型结构,研究了一个面部通用模型。作为一种直观的设计,Naive Faceptor使具有相同输出形状和粒度的任务可以共享标准输出头的结构设计,从而实现提高任务扩展性的目标。此外,Faceptor还提出了一个设计良好的单编码器双解码器架构,允许任务特定的查询表示新兴的语义。这种设计在提高模型结构统一的同时,提高了存储开销的应用效率。此外,我们还引入了层注意力机制到Faceptor中,使模型能够动态选择最优层中的特征来执行所需任务。通过在13个面部感知数据集上进行联合训练,Faceptor在面部关键点定位、面部解析、年龄估计、表情识别、二进制属性分类和面部识别等方面取得了惊人的性能,超越了大多数专用方法。我们的训练框架也可以应用于辅助监督学习,在数据稀疏任务(如年龄估计和表情识别)中显著提高性能。代码和模型将在这个https:// URL上公开发布。

URL

https://arxiv.org/abs/2403.09500

PDF

https://arxiv.org/pdf/2403.09500.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot