Paper Reading AI Learner

Latent Convolutional Models

2018-06-16 19:31:32
ShahRukh Athar, Evgeniy Burnaev, Victor Lempitsky

Abstract

We present a new latent model of natural images that can be learned on large-scale datasets. The learning process provides a latent embedding for every image in the training dataset, as well as a deep convolutional network that maps the latent space to the image space. After training, the new model provides a strong and universal image prior for a variety of image restoration tasks such as large-hole inpainting, superresolution, and colorization. To model high-resolution natural images, our approach uses latent spaces of very high dimensionality (one to two orders of magnitude higher than previous latent image models). To tackle this high dimensionality, we use latent spaces with a special manifold structure (convolutional manifolds) parameterized by a ConvNet of a certain architecture. In the experiments, we compare the learned latent models with latent models learned by autoencoders, advanced variants of generative adversarial networks, and a strong baseline system using simpler parameterization of the latent space. Our model outperforms the competing approaches over a range of restoration tasks.

Abstract (translated)

我们提出了一种可以在大型数据集上学习的自然图像的新潜在模型。学习过程为训练数据集中的每个图像提供潜在嵌入,以及将潜在空间映射到图像空间的深度卷积网络。在训练之后,新模型提供了强大而通用的图像,可用于各种图像恢复任务,如大孔修补,超分辨率和彩色化。为了对高分辨率的自然图像建模,我们的方法使用了非常高维的潜在空间(比先前的潜像模型高一到两个数量级)。为了解决这个高维问题,我们使用由特定体系结构的ConvNet参数化的特殊流形结构(卷积流形)的潜在空间。在实验中,我们将学习的潜在模型与自编码器学习的潜在模型,生成敌对网络的高级变体以及使用更简单的潜在空间参数化的强大基线系统进行比较。我们的模型胜过了一系列恢复任务的竞争方法。

URL

https://arxiv.org/abs/1806.06284

PDF

https://arxiv.org/pdf/1806.06284.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot