Paper Reading AI Learner

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

2024-04-27 02:40:36
Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

Abstract

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at this https URL.

Abstract (translated)

传统的基于深度神经网络(DNN)的图像质量评估(IQA)模型利用卷积神经网络(CNN)或Transformer来学习质量感知特征表示,在自然场景图像上取得出色的表现。然而,当应用于人工智能生成的图像(AGIs)时,这些DNN-based IQA模型表现不佳。这种情况很大程度上是因为某些AGI中存在语义不准确的原因,导致生成过程的无控制性。因此,辨别语义内容对于评估AGI的质量至关重要。传统的DNN-based IQA模型,由于参数复杂性和训练数据有限,很难捕捉到复杂的精细语义特征,使得整个图像的语义内容难以理解。为了弥补现有IQA模型在语义内容感知方面的不足,我们引入了一个大型多模态模型辅助人工智能生成图像质量评估(MA-AGIQA)模型,该模型利用语义指导来感知语义信息,并通过精心设计的文本提示提取语义向量。此外,它采用专家结构(MoE)来动态地整合传统DNN-based IQA模型提取的质量感知特征和语义信息。在两个AI生成内容数据集AIGCQA-20k和AGIQA-3k上进行全面的实验发现,MA-AGIQA达到最先进的性能,并证明了其在评估AGI质量方面的优越通用能力。代码可以从该链接获取。

URL

https://arxiv.org/abs/2404.17762

PDF

https://arxiv.org/pdf/2404.17762.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot