Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Abstract
Abstract (translated)
URL
PDF

Abstract

With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.

Abstract (translated)

随着文本到图像和图像到图像生成模型的成熟，AI生成的图像（AGIs）在广告、娱乐、教育、社交媒体等领域的应用潜力得到了很大的提升。尽管在生成模型方面取得了显著的进步，但很少有精力致力于设计相关的质量评估模型。在本文中，我们提出了一个名为AMFF-Net的新颖的盲图像质量评估（IQA）网络，用于AGIs。AMFF-Net从“视觉质量”、“真实性和一致性”三个维度评估AGI的质量。具体来说，为了模仿人视觉系统的特点，并受到观察到“视觉质量和真实性”既具有局部又具有全局特征的启发，AMFF-Net上下文扩展图像并获取多尺度特征。然后，使用自适应特征融合（AFF）块将多尺度特征与可学习权重进行自适应融合。此外，考虑到图像和提示之间的相关性，AMFF-Net将文本编码器和解码器中的语义特征与图像编码器中的语义特征进行比较，以评估文本到图像的对齐效果。我们在三个AGI质量评估数据库上进行了广泛的实验，实验结果表明，我们的AMFF-Net的性能优于九个最先进的盲IQA方法。消融实验的结果进一步证明了所提出的多尺度输入策略和AFF块的有效性。

URL

https://arxiv.org/abs/2404.15163

PDF

https://arxiv.org/pdf/2404.15163.pdf

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Abstract

Abstract (translated)

URL

PDF Copy

PDF