Paper Reading AI Learner

Few-shot Face Image Translation via GAN Prior Distillation

2023-01-28 17:30:44
Ruoyu Zhao, Mingrui Zhu, Xiaoyu Wang, Nannan Wang

Abstract

Face image translation has made notable progress in recent years. However, when training on limited data, the performance of existing approaches significantly declines. Although some studies have attempted to tackle this problem, they either failed to achieve the few-shot setting (less than 10) or can only get suboptimal results. In this paper, we propose GAN Prior Distillation (GPD) to enable effective few-shot face image translation. GPD contains two models: a teacher network with GAN Prior and a student network that fulfills end-to-end translation. Specifically, we adapt the teacher network trained on large-scale data in the source domain to the target domain with only a few samples, where it can learn the target domain's knowledge. Then, we can achieve few-shot augmentation by generating source domain and target domain images simultaneously with the same latent codes. We propose an anchor-based knowledge distillation module that can fully use the difference between the training and the augmented data to distill the knowledge of the teacher network into the student network. The trained student network achieves excellent generalization performance with the absorption of additional knowledge. Qualitative and quantitative experiments demonstrate that our method achieves superior results than state-of-the-art approaches in a few-shot setting.

Abstract (translated)

面部图像翻译近年来取得了显著进展。然而,在训练有限数据时,现有方法的性能 significantly decline。尽管有些研究试图解决这个问题,但它们要么未能达到少量样本(小于10)的水平,要么只能得到一般化的结果。在本文中,我们提出了GAN先验蒸馏(GPD)方法,以实现有效的少量面部图像翻译。GPD包含两个模型:一个带有GAN先验的老师网络和一个完成端到端翻译的学生网络。具体来说,我们使用在源领域中训练的大型数据集来将目标领域中仅几个样本的老师网络迁移到目标领域中,使其学习目标领域的知识。然后,我们可以通过同时生成源领域中和目标领域中相同的隐式代码实现少量样本增强。我们提出了基于锚点的知识蒸馏模块,它可以完全利用训练数据和增强数据之间的差异将老师网络的知识蒸馏到学生网络中。训练的学生网络通过吸收额外的知识实现出色的泛化性能。定性和定量实验表明,我们在少量样本设置下的表现优于当前技术水平的方法。

URL

https://arxiv.org/abs/2301.12257

PDF

https://arxiv.org/pdf/2301.12257.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot