Paper Reading AI Learner

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

2024-04-23 16:02:33
Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

Abstract

With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block.

Abstract (translated)

随着文本到图像和图像到图像生成模型的成熟,AI生成的图像(AGIs)在广告、娱乐、教育、社交媒体等领域的应用潜力得到了很大的提升。尽管在生成模型方面取得了显著的进步,但很少有精力致力于设计相关的质量评估模型。在本文中,我们提出了一个名为AMFF-Net的新颖的盲图像质量评估(IQA)网络,用于AGIs。AMFF-Net从“视觉质量”、“真实性和一致性”三个维度评估AGI的质量。具体来说,为了模仿人视觉系统的特点,并受到观察到“视觉质量和真实性”既具有局部又具有全局特征的启发,AMFF-Net上下文扩展图像并获取多尺度特征。然后,使用自适应特征融合(AFF)块将多尺度特征与可学习权重进行自适应融合。此外,考虑到图像和提示之间的相关性,AMFF-Net将文本编码器和解码器中的语义特征与图像编码器中的语义特征进行比较,以评估文本到图像的对齐效果。我们在三个AGI质量评估数据库上进行了广泛的实验,实验结果表明,我们的AMFF-Net的性能优于九个最先进的盲IQA方法。消融实验的结果进一步证明了所提出的多尺度输入策略和AFF块的有效性。

URL

https://arxiv.org/abs/2404.15163

PDF

https://arxiv.org/pdf/2404.15163.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot