Paper Reading AI Learner

Raising The Limit Of Image Rescaling Using Auxiliary Encoding

2023-03-12 20:49:07
Chenzhong Yin, Zhihong Pan, Xin Zhou, Le Kang, Paul Bogdan

Abstract

Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling by optimizing the downscaling and upscaling steps jointly. While the random sampling of latent variable $z$ is useful in generating diverse photo-realistic images, it is not desirable for image rescaling when accurate restoration of the HR image is more important. Hence, in places of random sampling of $z$, we propose auxiliary encoding modules to further push the limit of image rescaling performance. Two options to store the encoded latent variables in downscaled LR images, both readily supported in existing image file format, are proposed. One is saved as the alpha-channel, the other is saved as meta-data in the image header, and the corresponding modules are denoted as suffixes -A and -M respectively. Optimal network architectural changes are investigated for both options to demonstrate their effectiveness in raising the rescaling performance limit on different baseline models including IRN and DLV-IRN.

Abstract (translated)

使用可逆神经网络(INN)规范化流模型,已经广泛研究了成功生成高分辨率图像(SR)的方法,通过学习 latent variable $z$ 的正常分布和高分辨率图像的条件分布之间的转换,优化缩小和放大步骤,以推动图像放大性能的极限。最近,像 IRN 的图像重缩模型利用了 INN 的双向性质,通过优化缩小和放大步骤,将图像放大性能的极限推向极致。虽然随机采样 latent variable $z$ 可以用于生成各种逼真的图像,但在准确恢复高分辨率图像时,图像重缩并不是最好的选择。因此,在随机采样 $z$ 的位置,我们提出了辅助编码模块,以进一步推动图像重缩性能的极限。提出了两种选项,一种是保存在缩小的 LR 图像中的编码 latent变量,另一种是在图像头文件中保存元数据,对应的模块分别为suffix -A 和 -M。两种选项都进行了网络架构优化,以证明它们在提高包括 IRN 和 DLV-IRN 等多种基准模型的图像重缩性能极限方面的有效性。

URL

https://arxiv.org/abs/2303.06747

PDF

https://arxiv.org/pdf/2303.06747.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot