Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution

Abstract
Abstract (translated)
URL
PDF

Abstract

Pre-trained diffusion models utilized for image generation encapsulate a substantial reservoir of a priori knowledge pertaining to intricate textures. Harnessing the potential of leveraging this a priori knowledge in the context of image super-resolution presents a compelling avenue. Nonetheless, prevailing diffusion-based methodologies presently overlook the constraints imposed by degradation information on the diffusion process. Furthermore, these methods fail to consider the spatial variability inherent in the estimated blur kernel, stemming from factors such as motion jitter and out-of-focus elements in open-environment scenarios. This oversight results in a notable deviation of the image super-resolution effect from fundamental realities. To address these concerns, we introduce a framework known as Adaptive Multi-modal Fusion of \textbf{S}patially Variant Kernel Refinement with Diffusion Model for Blind Image \textbf{S}uper-\textbf{R}esolution (SSR). Within the SSR framework, we propose a Spatially Variant Kernel Refinement (SVKR) module. SVKR estimates a Depth-Informed Kernel, which takes the depth information into account and is spatially variant. Additionally, SVKR enhance the accuracy of depth information acquired from LR images, allowing for mutual enhancement between the depth map and blur kernel estimates. Finally, we introduce the Adaptive Multi-Modal Fusion (AMF) module to align the information from three modalities: low-resolution images, depth maps, and blur kernels. This alignment can constrain the diffusion model to generate more authentic SR results. Quantitative and qualitative experiments affirm the superiority of our approach, while ablation experiments corroborate the effectiveness of the modules we have proposed.

Abstract (translated)

前预训练扩散模型用于图像生成时,包含了与复杂纹理相关的先验知识的一个丰富资源。在将这个先验知识应用于图像超分辨率的过程中,会涌现出非常有吸引力的途径。然而,目前流行的扩散方法论忽略了降解信息对扩散过程所施加的约束。此外,这些方法也没有考虑到估计模糊卷积中存在的局域变异性,源于运动抖动和开环境场景中的失焦元素等因素。这一疏忽导致图像超分辨率的效果与基本现实存在显著偏差。为解决这些问题,我们引入了一个名为“自适应多模态融合扩散模型用于盲图像超分辨率”的框架。在SSR框架中,我们提出了一个空间可变卷积内核精炼(SVKR)模块。SVKR估计了一个深度相关的卷积核,考虑了深度信息,并且具有空间可变性。此外,SVKR通过从LR图像中获得的深度信息增强模糊卷积的准确性,使得深度图和模糊卷积估计之间可以相互增强。最后,我们引入了自适应多模态融合(AMF)模块,将来自低分辨率图像、深度地图和模糊卷积的信息对齐。这种对齐可以使扩散模型生成更真实SR结果。定量和定性实验证实了我们的方法的优势,而消融实验证实了我们提出的模块的有效性。

URL

https://arxiv.org/abs/2403.05808

PDF

https://arxiv.org/pdf/2403.05808.pdf