Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

Abstract
Abstract (translated)
URL
PDF

Abstract

High-quality image inpainting requires filling missing regions in a damaged image with plausible content. Existing works either fill the regions by copying image patches or generating semantically-coherent patches from region context, while neglect the fact that both visual and semantic plausibility are highly-demanded. In this paper, we propose a Pyramid-context ENcoder Network (PEN-Net) for image inpainting by deep generative models. The PEN-Net is built upon a U-Net structure, which can restore an image by encoding contextual semantics from full resolution input, and decoding the learned semantic features back into images. Specifically, we propose a pyramid-context encoder, which progressively learns region affinity by attention from a high-level semantic feature map and transfers the learned attention to the previous low-level feature map. As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured. We further propose a multi-scale decoder with deeply-supervised pyramid losses and an adversarial loss. Such a design not only results in fast convergence in training, but more realistic results in testing. Extensive experiments on various datasets show the superior performance of the proposed network

Abstract (translated)

高质量的图像修复需要用合理的内容填充受损图像中的缺失区域。现有的工作要么通过复制图像补丁来填充区域，要么通过区域上下文生成语义一致的补丁，而忽略了这样一个事实，即高度要求视觉和语义的合理性。本文提出了一种基于深度生成模型的图像修复金字塔上下文编码网络（pen net）。笔网是建立在U-NET结构之上的，它可以通过对来自全分辨率输入的上下文语义进行编码，并将学习到的语义特征解码为图像来恢复图像。具体地说，我们提出了一种金字塔上下文编码器，它从一个高级语义特征图中通过关注逐步学习区域亲和性，并将学习到的关注转移到以前的低级特征图中。由于注意力从深到浅以金字塔的方式传递，可以填充缺失的内容，因此可以确保图像修复的视觉和语义一致性。进一步提出了一种多尺度译码器，该译码器具有高度监控的金字塔损耗和对抗性损耗。这样的设计不仅在训练中产生了快速的收敛，而且在测试中得到了更真实的结果。对各种数据集进行的大量实验表明，该网络具有优越的性能。

URL

https://arxiv.org/abs/1904.07475

PDF

https://arxiv.org/pdf/1904.07475.pdf