Paper Reading AI Learner

PP-GAN : Style Transfer from Korean Portraits to ID Photos Using Landmark Extractor with GAN

2023-06-23 10:10:16
Jongwook Si, Sungyoung Kim

Abstract

The objective of a style transfer is to maintain the content of an image while transferring the style of another image. However, conventional research on style transfer has a significant limitation in preserving facial landmarks, such as the eyes, nose, and mouth, which are crucial for maintaining the identity of the image. In Korean portraits, the majority of individuals wear "Gat", a type of headdress exclusively worn by men. Owing to its distinct characteristics from the hair in ID photos, transferring the "Gat" is challenging. To address this issue, this study proposes a deep learning network that can perform style transfer, including the "Gat", while preserving the identity of the face. Unlike existing style transfer approaches, the proposed method aims to preserve texture, costume, and the "Gat" on the style image. The Generative Adversarial Network forms the backbone of the proposed network. The color, texture, and intensity were extracted differently based on the characteristics of each block and layer of the pre-trained VGG-16, and only the necessary elements during training were preserved using a facial landmark mask. The head area was presented using the eyebrow area to transfer the "Gat". Furthermore, the identity of the face was retained, and style correlation was considered based on the Gram matrix. The proposed approach demonstrated superior transfer and preservation performance compared to previous studies.

Abstract (translated)

风格迁移的目标是在保持图像内容的同时迁移另一张图像的风格。然而,传统的风格迁移研究在保留面部特征方面存在巨大的限制,例如眼睛、鼻子和嘴巴等面部地标,这些特征是保持图像身份的关键。在韩国肖像中,大多数人穿着“Gat”,这是一种专为男性准备的头巾,由于其与身份照片 hair 的显著特征不同,因此迁移“Gat”是一项挑战性的任务。为了解决这一问题,本研究提出了一种深度学习网络,可以在进行风格迁移的同时保留面部身份。与现有的风格迁移方法不同,该方法旨在在风格图像中保留纹理、服装和“Gat”等特征。生成对抗网络是该网络的基座。根据训练集每个块和层的特征,不同地从每个块和层中提取颜色、纹理和强度,在训练期间仅使用面部地标 mask 保留必要的元素。头部区域使用眉间区域来呈现“Gat”特征。此外,保留了面部身份,并考虑风格相关度基于Gram矩阵。该方法相对于以前的研究表现出更好的传输和保留性能。

URL

https://arxiv.org/abs/2306.13418

PDF

https://arxiv.org/pdf/2306.13418.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot