Paper Reading AI Learner

Q-Insight: Understanding Image Quality via Visual Reinforcement Learning

2025-03-28 17:59:54
Weiqi Li, Xuanyu Zhang, Shijie Zhao, Yabin Zhang, Junlin Li, Li Zhang, Jian Zhang

Abstract

Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation. The rapid advancement of multi-modal large language models (MLLMs) has significantly broadened the scope of IQA, moving toward comprehensive image quality understanding that incorporates content analysis, degradation perception, and comparison reasoning beyond mere numerical scoring. Previous MLLM-based methods typically either generate numerical scores lacking interpretability or heavily rely on supervised fine-tuning (SFT) using large-scale annotated datasets to provide descriptive assessments, limiting their flexibility and applicability. In this paper, we propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO), which demonstrates strong visual reasoning capability for image quality understanding while requiring only a limited amount of rating scores and degradation labels. By jointly optimizing score regression and degradation perception tasks with carefully designed reward functions, our approach effectively exploits their mutual benefits for enhanced performance. Extensive experiments demonstrate that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks, while exhibiting impressive zero-shot generalization to comparison reasoning tasks. Code will be available at this https URL.

Abstract (translated)

图像质量评估(IQA)主要关注图像的感知视觉质量,在诸如图像重建、压缩和生成等下游任务中发挥着至关重要的作用。多模态大型语言模型(MLLMs)的迅速发展极大地扩展了IQA的研究范围,使其向着综合性的图像质量理解迈进,包括内容分析、退化感知以及比较推理等多个方面,而不仅仅是单一的数值评分。 以往基于MLLM的方法要么生成缺乏解释性的数值分数,要么过度依赖大规模标注数据集进行监督微调(SFT)来提供描述性评估,从而限制了其灵活性和适用范围。在本文中,我们提出了Q-Insight模型,这是一个基于群体相对策略优化(GRPO)的强化学习模型,在需要少量评分分数和退化标签的情况下,展示出强大的视觉推理能力以理解图像质量。通过精心设计的奖励函数来联合优化评分回归和退化感知任务,我们的方法有效利用了两者之间的相互利益,从而提高了性能。 广泛的实验表明,Q-Insight在评分回归和退化感知任务上显著优于现有的最先进方法,并且在比较推理任务中表现出令人印象深刻的零样本泛化能力。代码将在以下链接提供:[URL]。

URL

https://arxiv.org/abs/2503.22679

PDF

https://arxiv.org/pdf/2503.22679.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot