Abstract
Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling by optimizing the downscaling and upscaling steps jointly. While the random sampling of latent variable $z$ is useful in generating diverse photo-realistic images, it is not desirable for image rescaling when accurate restoration of the HR image is more important. Hence, in places of random sampling of $z$, we propose auxiliary encoding modules to further push the limit of image rescaling performance. Two options to store the encoded latent variables in downscaled LR images, both readily supported in existing image file format, are proposed. One is saved as the alpha-channel, the other is saved as meta-data in the image header, and the corresponding modules are denoted as suffixes -A and -M respectively. Optimal network architectural changes are investigated for both options to demonstrate their effectiveness in raising the rescaling performance limit on different baseline models including IRN and DLV-IRN.
Abstract (translated)
使用可逆神经网络(INN)规范化流模型,已经广泛研究了成功生成高分辨率图像(SR)的方法,通过学习 latent variable $z$ 的正常分布和高分辨率图像的条件分布之间的转换,优化缩小和放大步骤,以推动图像放大性能的极限。最近,像 IRN 的图像重缩模型利用了 INN 的双向性质,通过优化缩小和放大步骤,将图像放大性能的极限推向极致。虽然随机采样 latent variable $z$ 可以用于生成各种逼真的图像,但在准确恢复高分辨率图像时,图像重缩并不是最好的选择。因此,在随机采样 $z$ 的位置,我们提出了辅助编码模块,以进一步推动图像重缩性能的极限。提出了两种选项,一种是保存在缩小的 LR 图像中的编码 latent变量,另一种是在图像头文件中保存元数据,对应的模块分别为suffix -A 和 -M。两种选项都进行了网络架构优化,以证明它们在提高包括 IRN 和 DLV-IRN 等多种基准模型的图像重缩性能极限方面的有效性。
URL
https://arxiv.org/abs/2303.06747