Abstract
Generating high-resolution, photo-realistic images has been a long-standing goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a state-of-the-art generative model that produces high quality images at higher resolutions (227x227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models "Plug and Play Generative Networks". PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable "condition" network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modality-agnostic and can be applied to many types of data.
Abstract (translated)
生成高分辨率照片般逼真的图像一直是机器学习中的一个长期目标。最近,Nguyen等人(2016)展示了一种通过在发生器网络的潜在空间中执行梯度上升以合成新颖图像的有趣方式,以最大化单独分类器网络中的一个或多个神经元的激活。在本文中,我们通过在潜在代码中引入一个额外优先事项来扩展此方法,从而提高了样本质量和样本多样性,从而形成了一种先进的生成模型,可在较高分辨率(227x227)下生成高质量图像生成模型,并且对所有1000个ImageNet类别都是如此。另外,我们对相关的激活最大化方法提供了一个统一的概率解释,并将模型称为“即插即用生成网络”。 PPGN由以下部分组成:1)能够绘制各种图像类型的发生器网络G; 2)可替换的“条件”网络C,告知发生器绘制什么。我们演示了一类图像的生成(当C是ImageNet或MIT Places分类网络时),并且还以标题(当C是图像字幕网络时)为条件。我们的方法还改进了多面特征可视化技术的状态,该技术生成一组激活神经元的合成输入,以更好地理解神经网络的深度如何操作。最后,我们展示了我们的模型在图像修复的任务中表现得相当好。虽然在本文中使用了图像模型,但该方法与形式无关,可应用于许多类型的数据。
URL
https://arxiv.org/abs/1612.00005