Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Abstract
Abstract (translated)
URL
PDF

Abstract

Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. Code is available at this https URL.

Abstract (translated)

传统的基于深度神经网络（DNN）的图像质量评估（IQA）模型利用卷积神经网络（CNN）或Transformer来学习质量感知特征表示，在自然场景图像上取得出色的表现。然而，当应用于人工智能生成的图像（AGIs）时，这些DNN-based IQA模型表现不佳。这种情况很大程度上是因为某些AGI中存在语义不准确的原因，导致生成过程的无控制性。因此，辨别语义内容对于评估AGI的质量至关重要。传统的DNN-based IQA模型，由于参数复杂性和训练数据有限，很难捕捉到复杂的精细语义特征，使得整个图像的语义内容难以理解。为了弥补现有IQA模型在语义内容感知方面的不足，我们引入了一个大型多模态模型辅助人工智能生成图像质量评估（MA-AGIQA）模型，该模型利用语义指导来感知语义信息，并通过精心设计的文本提示提取语义向量。此外，它采用专家结构（MoE）来动态地整合传统DNN-based IQA模型提取的质量感知特征和语义信息。在两个AI生成内容数据集AIGCQA-20k和AGIQA-3k上进行全面的实验发现，MA-AGIQA达到最先进的性能，并证明了其在评估AGI质量方面的优越通用能力。代码可以从该链接获取。

URL

https://arxiv.org/abs/2404.17762

PDF

https://arxiv.org/pdf/2404.17762.pdf

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Abstract

Abstract (translated)

URL

PDF Copy

PDF