Abstract
Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.
Abstract (translated)
尽管扩散模型已成功应用于各种图像修复(IR)任务,但它们的性能对训练数据的选择非常敏感。通常,在特定数据集上训练的扩散模型无法恢复具有离散退化的图像。为解决这个问题,本文利用了一个强大的视觉语言模型和一个合成退化管道来在野外学习图像修复(野生IR)。具体来说,本文使用包含 blur、resize、noise 和 JPEG 压缩等多种常见退化的合成退化管道来模拟所有低质量图像。然后,我们引入了一种对退化的关注的 CLIP 模型,以提取高质量图像修复所需的丰富图像内容特征。我们的基本扩散模型是图像修复 SDE(IR-SDE)。基于它,我们进一步提出了一种用于快速无噪声图像生成的后验采样策略。我们对模型在合成和真实世界退化数据集上的性能进行了评估。此外,在统一图像修复任务上进行的实验表明,所提出的后验采样策略可以提高各种退化下的图像生成质量。
URL
https://arxiv.org/abs/2404.09732