Paper Reading AI Learner

Degradation-Aware Image Enhancement via Vision-Language Classification

2025-06-05 17:42:01
Jie Cai, Kangning Yang, Jiaming Ding, Lan Fu, Ling Ouyang, Jiang Li, Jinglin Shen, Zibo Meng

Abstract

Image degradation is a prevalent issue in various real-world applications, affecting visual quality and downstream processing tasks. In this study, we propose a novel framework that employs a Vision-Language Model (VLM) to automatically classify degraded images into predefined categories. The VLM categorizes an input image into one of four degradation types: (A) super-resolution degradation (including noise, blur, and JPEG compression), (B) reflection artifacts, (C) motion blur, or (D) no visible degradation (high-quality image). Once classified, images assigned to categories A, B, or C undergo targeted restoration using dedicated models tailored for each specific degradation type. The final output is a restored image with improved visual quality. Experimental results demonstrate the effectiveness of our approach in accurately classifying image degradations and enhancing image quality through specialized restoration models. Our method presents a scalable and automated solution for real-world image enhancement tasks, leveraging the capabilities of VLMs in conjunction with state-of-the-art restoration techniques.

Abstract (translated)

图像退化是各种实际应用中的一个常见问题,它会影响视觉质量并影响后续处理任务。在这项研究中,我们提出了一种新颖的框架,该框架采用视觉-语言模型(VLM)自动将受损图像分类为预定义类别。VLM将输入图像归类为四种降质类型之一:(A) 超分辨率退化(包括噪声、模糊和JPEG压缩),(B) 反射伪影,(C) 运动模糊,或 (D) 无明显退化(高质量图像)。一旦分类完成,被分配到类别 A、B 或 C 的图像将使用针对每种特定降质类型量身定制的模型进行目标修复。最终输出是视觉质量得到提升的恢复后的图像。 实验结果证明了我们的方法在准确分类图像退化和通过专门的修复技术提高图像质量方面的有效性。本方法为实际图像增强任务提供了一种可扩展且自动化的解决方案,利用了VLM与最先进的修复技术相结合的能力。

URL

https://arxiv.org/abs/2506.05450

PDF

https://arxiv.org/pdf/2506.05450.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot