Abstract
In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at this https URL.
Abstract (translated)
在现实生活中,图像通常表现出多种降噪,例如夜间(三重降噪)。然而,在许多情况下,个人可能不想移除所有降噪,例如,一个模糊的镜头揭示了一个美丽的雪景(双重降噪)。在这些场景中,人们只想去模糊。这些情况和需求阐明了图像修复领域的一个新挑战,即模型必须通过图像中的特定降噪类型来感知并移除。我们称之为 Referring Flexible Image Restoration (RFIR) 任务。为解决这个挑战,我们首先构建了一个名为 RFIR 的大规模合成数据集,包括 153,423 个样本,其中有损坏图像、特定降噪的文本提示和修复图像。RFIR 包括五种基本降噪类型:模糊、雨、雾、低光和雪,同时包括六种主要降噪子类别,以不同程度地移除降噪。为了应对这个挑战,我们提出了一个基于 transformer 的多任务模型,名为 TransRFIR,它同时感知损坏图像中的降噪类型并在文本提示上移除特定降噪。TransRFIR 基于两个设计的注意力模块,Multi-Head Agent Self-Attention (MHASA) 和 Multi-Head Agent Cross Attention (MHACA)。MHASA 和 MHACA 引入了代理标记和到达线性复杂性,实现了低于普通自注意力和跨注意力的计算成本,并获得竞争力的性能。与其它类似实现相比,我们的 TransRFIR 取得了最先进的性能,并证明了在图像修复领域这是一个有效的架构。我们将该项目发布在 https:// this URL。
URL
https://arxiv.org/abs/2404.10342