Paper Reading AI Learner

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

2017-04-12 06:39:52
Anh Nguyen, Jeff Clune, Yoshua Bengio, Alexey Dosovitskiy, Jason Yosinski

Abstract

Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227x227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models "Plug and Play Generative Networks". PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data.

Abstract (translated)

生成高分辨率照片般逼真的图像一直是机器学习中的一个长期目标。最近,Nguyen等人(2016)展示了一种通过在发生器网络的潜在空间中执行梯度上升以合成新颖图像的有趣方式,以最大化单独分类器网络中的一个或多个神经元的激活。在本文中,我们通过在潜在代码中引入一个额外优先事项来扩展此方法,从而提高了样本质量和样本多样性,从而形成了一种先进的生成模型,可在较高分辨率(227x227)下生成高质量图像生成模型,并且对所有1000个ImageNet类别都是如此。另外,我们对相关的激活最大化方法提供了一个统一的概率解释,并将模型称为“即插即用生成网络”。 PPGN由以下部分组成:1)能够绘制各种图像类型的发生器网络G; 2)可替换的“条件”网络C,告知发生器绘制什么。我们演示了一类图像的生成(当C是ImageNet或MIT Places分类网络时),并且还以标题(当C是图像字幕网络时)为条件。我们的方法还改进了多面特征可视化技术的状态,该技术生成一组激活神经元的合成输入,以更好地理解神经网络的深度如何操作。最后,我们展示了我们的模型在图像修复的任务中表现得相当好。虽然在本文中使用了图像模型,但该方法与形式无关,可应用于许多类型的数据。

URL

https://arxiv.org/abs/1612.00005

PDF

https://arxiv.org/pdf/1612.00005.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot