Paper Reading AI Learner

Blind Multimodal Quality Assessment: A Brief Survey and A Case Study of Low-light Images

2023-03-18 09:04:55
Miaohui Wang, Zhuowei Xu, Mai Xu, Weisi Lin

Abstract

Blind image quality assessment (BIQA) aims at automatically and accurately forecasting objective scores for visual signals, which has been widely used to monitor product and service quality in low-light applications, covering smartphone photography, video surveillance, autonomous driving, etc. Recent developments in this field are dominated by unimodal solutions inconsistent with human subjective rating patterns, where human visual perception is simultaneously reflected by multiple sensory information (e.g., sight and hearing). In this article, we present a unique blind multimodal quality assessment (BMQA) of low-light images from subjective evaluation to objective score. To investigate the multimodal mechanism, we first establish a multimodal low-light image quality (MLIQ) database with authentic low-light distortions, containing image and audio modality pairs. Further, we specially design the key modules of BMQA, considering multimodal quality representation, latent feature alignment and fusion, and hybrid self-supervised and supervised learning. Extensive experiments show that our BMQA yields state-of-the-art accuracy on the proposed MLIQ benchmark database. In particular, we also build an independent single-image modality Dark-4K database, which is used to verify its applicability and generalization performance in mainstream unimodal applications. Qualitative and quantitative results on Dark-4K show that BMQA achieves superior performance to existing BIQA approaches as long as a pre-trained quality semantic description model is provided. The proposed framework and two databases as well as the collected BIQA methods and evaluation metrics are made publicly available.

Abstract (translated)

Blind image quality assessment (BIQA) 旨在自动和准确地预测视觉信号的主观评分,该方法被广泛应用于低光应用中的产品质量监控,包括智能手机摄影、视频监视、自动驾驶等。该领域最近的发展主要由单目解决方案与人类主观评价模式不一致的情况主导,人类视觉感知同时由多种感官信息(如视觉和听觉)同时反映。在本文中,我们介绍了一种独特的从主观评价到客观评分的全天候多模态质量评估(BMQA)方法,以研究多模态机制。为了研究多模态机制,我们首先建立了一个全天候低光图像质量(MLIQ)数据库,其中包含真实的低光扭曲,包含图像和音频模态对。此外,我们还特别设计了BMQA的关键模块,考虑多模态质量表示、潜在特征对齐和融合,以及混合自监督和监督学习。广泛的实验表明,我们的BMQA在提出的MLIQ基准数据库上表现出最先进的准确性。特别是,我们还建立了一个独立的单图像模态暗4K数据库,用于验证它在主流单模态应用中的适用性和泛化性能。暗4K数据库的定性和定量结果表明,只要提供预训练的质量语义描述模型,BMQA就能够实现与现有BIQA方法相比更好的性能。 proposed 框架和两个数据库,以及收集的BIQA方法和评估指标,均公开发布。

URL

https://arxiv.org/abs/2303.10369

PDF

https://arxiv.org/pdf/2303.10369.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot