Paper Reading AI Learner

StyleGANEX: StyleGAN-Based Manipulation Beyond Cropped Aligned Faces

2023-03-10 18:59:33
Shuai Yang, Liming Jiang, Ziwei Liu, Chen Change Loy

Abstract

Recent advances in face manipulation using StyleGAN have produced impressive results. However, StyleGAN is inherently limited to cropped aligned faces at a fixed image resolution it is pre-trained on. In this paper, we propose a simple and effective solution to this limitation by using dilated convolutions to rescale the receptive fields of shallow layers in StyleGAN, without altering any model parameters. This allows fixed-size small features at shallow layers to be extended into larger ones that can accommodate variable resolutions, making them more robust in characterizing unaligned faces. To enable real face inversion and manipulation, we introduce a corresponding encoder that provides the first-layer feature of the extended StyleGAN in addition to the latent style code. We validate the effectiveness of our method using unaligned face inputs of various resolutions in a diverse set of face manipulation tasks, including facial attribute editing, super-resolution, sketch/mask-to-face translation, and face toonification.

Abstract (translated)

使用StyleGAN进行面部操纵的最新进展已经取得了令人印象深刻的结果。然而,StyleGAN本身的局限性在于其预先训练时的固定图像分辨率下的对齐面部裁剪。在本文中,我们提出了一种简单而有效的解决方案,通过使用扩展的卷积来重新缩放StyleGAN浅层响应面,而无需改变任何模型参数。这使固定大小的小型特征在浅层层中被扩展为能够适应变量分辨率的更大的特征,从而使它们更加鲁棒地描述非对齐面部。为了实现真正的面部翻转和操纵,我们引入了一个相应的编码器,它提供扩展StyleGAN的第一层特征,同时提供隐式的样式代码。我们使用不同分辨率的非对齐面部输入在多个面部操纵任务中,包括面部属性编辑、超分辨率、 Sketch/Mask-to-Face翻译和面部魔变中验证了我们方法的有效性。

URL

https://arxiv.org/abs/2303.06146

PDF

https://arxiv.org/pdf/2303.06146.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot