Paper Reading AI Learner

Referring Flexible Image Restoration

2024-04-16 07:25:17
Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

Abstract

In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image restoration, where a model must perceive and remove specific degradation types specified by human commands in images with multiple degradations. We term this task Referring Flexible Image Restoration (RFIR). To address this, we first construct a large-scale synthetic dataset called RFIR, comprising 153,423 samples with the degraded image, text prompt for specific degradation removal and restored image. RFIR consists of five basic degradation types: blur, rain, haze, low light and snow while six main sub-categories are included for varying degrees of degradation removal. To tackle the challenge, we propose a novel transformer-based multi-task model named TransRFIR, which simultaneously perceives degradation types in the degraded image and removes specific degradation upon text prompt. TransRFIR is based on two devised attention modules, Multi-Head Agent Self-Attention (MHASA) and Multi-Head Agent Cross Attention (MHACA), where MHASA and MHACA introduce the agent token and reach the linear complexity, achieving lower computation cost than vanilla self-attention and cross-attention and obtaining competitive performances. Our TransRFIR achieves state-of-the-art performances compared with other counterparts and is proven as an effective architecture for image restoration. We release our project at this https URL.

Abstract (translated)

在现实生活中,图像通常表现出多种降噪,例如夜间(三重降噪)。然而,在许多情况下,个人可能不想移除所有降噪,例如,一个模糊的镜头揭示了一个美丽的雪景(双重降噪)。在这些场景中,人们只想去模糊。这些情况和需求阐明了图像修复领域的一个新挑战,即模型必须通过图像中的特定降噪类型来感知并移除。我们称之为 Referring Flexible Image Restoration (RFIR) 任务。为解决这个挑战,我们首先构建了一个名为 RFIR 的大规模合成数据集,包括 153,423 个样本,其中有损坏图像、特定降噪的文本提示和修复图像。RFIR 包括五种基本降噪类型:模糊、雨、雾、低光和雪,同时包括六种主要降噪子类别,以不同程度地移除降噪。为了应对这个挑战,我们提出了一个基于 transformer 的多任务模型,名为 TransRFIR,它同时感知损坏图像中的降噪类型并在文本提示上移除特定降噪。TransRFIR 基于两个设计的注意力模块,Multi-Head Agent Self-Attention (MHASA) 和 Multi-Head Agent Cross Attention (MHACA)。MHASA 和 MHACA 引入了代理标记和到达线性复杂性,实现了低于普通自注意力和跨注意力的计算成本,并获得竞争力的性能。与其它类似实现相比,我们的 TransRFIR 取得了最先进的性能,并证明了在图像修复领域这是一个有效的架构。我们将该项目发布在 https:// this URL。

URL

https://arxiv.org/abs/2404.10342

PDF

https://arxiv.org/pdf/2404.10342.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot