Paper Reading AI Learner

Dual-Branch Network for Portrait Image Quality Assessment

2024-05-14 12:43:43
Wei Sun, Weixia Zhang, Yanwei Jiang, Haoning Wu, Zicheng Zhang, Jun Jia, Yingjie Zhou, Zhongpeng Ji, Xiongkuo Min, Weisi Lin, Guangtao Zhai

Abstract

Portrait images typically consist of a salient person against diverse backgrounds. With the development of mobile devices and image processing techniques, users can conveniently capture portrait images anytime and anywhere. However, the quality of these portraits may suffer from the degradation caused by unfavorable environmental conditions, subpar photography techniques, and inferior capturing devices. In this paper, we introduce a dual-branch network for portrait image quality assessment (PIQA), which can effectively address how the salient person and the background of a portrait image influence its visual quality. Specifically, we utilize two backbone networks (\textit{i.e.,} Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it. To enhance the quality-aware feature representation of the backbones, we pre-train them on the large-scale video quality assessment dataset LSVQ and the large-scale facial image quality assessment dataset GFIQA. Additionally, we leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features. Finally, we concatenate these features and regress them into quality scores via a multi-perception layer (MLP). We employ the fidelity loss to train the model via a learning-to-rank manner to mitigate inconsistencies in quality scores in the portrait image quality assessment dataset PIQ. Experimental results demonstrate that the proposed model achieves superior performance in the PIQ dataset, validating its effectiveness. The code is available at \url{this https URL}.

Abstract (translated)

肖像图像通常由一个突出的人物和多种不同的背景组成。随着移动设备的发展和图像处理技术的不断发展,用户可以随时随地方便地捕捉到肖像图像。然而,这些肖像可能会受到不良环境条件、拍摄技巧和低质量采集设备等因素引起的质量下降的影响。在本文中,我们提出了一个用于肖像图像质量评估(PIQA)的双分支网络,可以有效地解决突出的人物和肖像图像背景如何影响其视觉质量的问题。具体来说,我们使用两个骨干网络(即Swin Transformer-B)从整个肖像图像和从其中提取的面部图像中提取质量感知特征。为了提高骨干网络的质量感知特征表示,我们在LSVQ和GFIQA等大规模视频质量评估数据集上进行预训练。此外,我们还利用LIQE,一种图像场景分类和质量评估模型,作为辅助特征来捕捉质量感知和场景特定的特征。最后,我们通过多感知层(MLP)将这些特征进行特征串联并对其进行回归,并通过一个多感知层(MLP)将特征和质量评分回归到质量分数。我们使用可靠性损失来通过学习排序的方式来训练模型,以减轻肖像图像质量评估数据集中质量评分不一致的问题。实验结果表明,与原始数据集相比,所提出的模型在PIQA数据集上取得了卓越的性能,验证了其有效性。代码可在此处访问:\url{this <https:// this URL>.

URL

https://arxiv.org/abs/2405.08555

PDF

https://arxiv.org/pdf/2405.08555.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot