Paper Reading AI Learner

Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

2019-04-16 05:51:37
Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

Abstract

High-quality image inpainting requires filling missing regions in a damaged image with plausible content. Existing works either fill the regions by copying image patches or generating semantically-coherent patches from region context, while neglect the fact that both visual and semantic plausibility are highly-demanded. In this paper, we propose a Pyramid-context ENcoder Network (PEN-Net) for image inpainting by deep generative models. The PEN-Net is built upon a U-Net structure, which can restore an image by encoding contextual semantics from full resolution input, and decoding the learned semantic features back into images. Specifically, we propose a pyramid-context encoder, which progressively learns region affinity by attention from a high-level semantic feature map and transfers the learned attention to the previous low-level feature map. As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured. We further propose a multi-scale decoder with deeply-supervised pyramid losses and an adversarial loss. Such a design not only results in fast convergence in training, but more realistic results in testing. Extensive experiments on various datasets show the superior performance of the proposed network

Abstract (translated)

高质量的图像修复需要用合理的内容填充受损图像中的缺失区域。现有的工作要么通过复制图像补丁来填充区域,要么通过区域上下文生成语义一致的补丁,而忽略了这样一个事实,即高度要求视觉和语义的合理性。本文提出了一种基于深度生成模型的图像修复金字塔上下文编码网络(pen net)。笔网是建立在U-NET结构之上的,它可以通过对来自全分辨率输入的上下文语义进行编码,并将学习到的语义特征解码为图像来恢复图像。具体地说,我们提出了一种金字塔上下文编码器,它从一个高级语义特征图中通过关注逐步学习区域亲和性,并将学习到的关注转移到以前的低级特征图中。由于注意力从深到浅以金字塔的方式传递,可以填充缺失的内容,因此可以确保图像修复的视觉和语义一致性。进一步提出了一种多尺度译码器,该译码器具有高度监控的金字塔损耗和对抗性损耗。这样的设计不仅在训练中产生了快速的收敛,而且在测试中得到了更真实的结果。对各种数据集进行的大量实验表明,该网络具有优越的性能。

URL

https://arxiv.org/abs/1904.07475

PDF

https://arxiv.org/pdf/1904.07475.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot