Paper Reading AI Learner

Generative Semantic Manipulation with Contrasting GAN

2017-08-01 13:46:32
Xiaodan Liang, Hao Zhang, Eric P. Xing

Abstract

Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer. However, existing models can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-level semantic meanings (e.g., geometric structure or content) of objects. On the other hand, while some researches can synthesize compelling real-world images given a class label or caption, they cannot condition on arbitrary shapes or structures, which largely limits their application scenarios and interpretive capability of model results. In this work, we focus on a more challenging semantic manipulation task, which aims to modify the semantic meaning of an object while preserving its own characteristics (e.g. viewpoints and shapes), such as cow$\rightarrow$sheep, motor$\rightarrow$ bicycle, cat$\rightarrow$dog. To tackle such large semantic changes, we introduce a contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective. Instead of directly making the synthesized samples close to target data as previous GANs did, our adversarial contrasting objective optimizes over the distance comparisons between samples, that is, enforcing the manipulated data be semantically closer to the real data with target category than the input data. Equipped with the new contrasting objective, a novel mask-conditional contrast-GAN architecture is proposed to enable disentangle image background with object semantic changes. Experiments on several semantic manipulation tasks on ImageNet and MSCOCO dataset show considerable performance gain by our contrast-GAN over other conditional GANs. Quantitative results further demonstrate the superiority of our model on generating manipulated results with high visual fidelity and reasonable object semantics.

Abstract (translated)

生成敌对网络(GAN)最近在成对/不成对的图像到图像转换方面取得了显着的进步,如照片$ rightarrow $ sketch和艺术家绘画风格转换。然而,现有模型只能够传递低级别信息(例如颜色或纹理变化),但不能编辑对象的高级语义含义(例如,几何结构或内容)。另一方面,尽管一些研究可以在给定类标签或标题的情况下综合引人注目的真实世界图像,但它们不能限制任意形状或结构,这在很大程度上限制了它们的应用场景和模型结果的解释能力。在这项工作中,我们专注于一个更具挑战性的语义操作任务,该任务旨在修改对象的语义,同时保留其自身特征(例如视点和形状),如牛$ \ rightarrow $ sheep,motor $ \ rightarrow $自行车,猫$ \ rightarrow $狗。为了解决如此大的语义变化,我们引入了一个对比的GAN(对比GAN)和一个新颖的对抗对比目标。我们的敌对对比目标不是像以前的GAN那样直接将目标数据靠近目标数据,而是优化了样本之间的距离比较,也就是说,强制操纵的数据在语义上更接近于目标类别与输入数据的实际数据。配备了新的对比目标,提出了一种新的掩模 - 条件对比度 - GAN体系结构,以便能够在对象语义变化的情况下解开图像背景。对ImageNet和MSCOCO数据集上的若干语义操纵任务的实验表明,我们的对比度GAN相对于其他条件GAN获得了相当大的性能增益。定量结果进一步证明了我们的模型在生成具有高视觉保真度和合理对象语义的操作结果方面的优越性。

URL

https://arxiv.org/abs/1708.00315

PDF

https://arxiv.org/pdf/1708.00315.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot