Paper Reading AI Learner

Appearance and Pose-Conditioned Human Image Generation using Deformable GANs

2019-04-30 07:35:15
Aliaksandr Siarohin, Stéphane Lathuilière, Enver Sangineto, Nicu Sebe

Abstract

In this paper, we address the problem of generating person images conditioned on both pose and appearance information. Specifically, given an image xa of a person and a target pose P(xb), extracted from a different image xb, we synthesize a new image of that person in pose P(xb), while preserving the visual details in xa. In order to deal with pixel-to-pixel misalignments caused by the pose differences between P(xa) and P(xb), we introduce deformable skip connections in the generator of our Generative Adversarial Network. Moreover, a nearest-neighbour loss is proposed instead of the common L1 and L2 losses in order to match the details of the generated image with the target image. Quantitative and qualitative results, using common datasets and protocols recently proposed for this task, show that our approach is competitive with respect to the state of the art. Moreover, we conduct an extensive evaluation using off-the-shell person re-identification (Re-ID) systems trained with person-generation based augmented data, which is one of the main important applications for this task. Our experiments show that our Deformable GANs can significantly boost the Re-ID accuracy and are even better than data-augmentation methods specifically trained using Re-ID losses.

Abstract (translated)

本文讨论了基于姿态和外貌信息的人像生成问题。具体地说,给定一个人的图像xa和从另一个图像xb中提取的目标姿势p(xb),我们在保持xa中的视觉细节的同时合成该人在姿势p(xb)中的新图像。为了解决P(xa)和P(xb)之间的位姿差异导致的像素对像素错位问题,我们在生成对抗网络的生成器中引入了可变形跳跃连接。此外,为了使生成的图像的细节与目标图像相匹配,提出了一种最近邻损失来代替普通的L1和L2损失。定量和定性的结果,利用最近提出的共同数据集和协议,表明我们的方法在最新技术方面具有竞争力。此外,我们使用基于人员生成的增强数据(这是此任务的主要应用之一)培训的非外壳人员重新识别(RE ID)系统进行了广泛的评估。我们的实验表明,我们的变形甘斯可以显著提高重新ID的准确性,甚至优于专门训练的数据增强方法使用重新ID损失。

URL

https://arxiv.org/abs/1905.00007

PDF

https://arxiv.org/pdf/1905.00007.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot