Paper Reading AI Learner

Causal Perception Inspired Representation Learning for Trustworthy Image Quality Assessment

2024-04-30 13:55:30
Lei Wang, Desen Yuan

Abstract

Despite great success in modeling visual perception, deep neural network based image quality assessment (IQA) still remains unreliable in real-world applications due to its vulnerability to adversarial perturbations and the inexplicit black-box structure. In this paper, we propose to build a trustworthy IQA model via Causal Perception inspired Representation Learning (CPRL), and a score reflection attack method for IQA model. More specifically, we assume that each image is composed of Causal Perception Representation (CPR) and non-causal perception representation (N-CPR). CPR serves as the causation of the subjective quality label, which is invariant to the imperceptible adversarial perturbations. Inversely, N-CPR presents spurious associations with the subjective quality label, which may significantly change with the adversarial perturbations. To extract the CPR from each input image, we develop a soft ranking based channel-wise activation function to mediate the causally sufficient (beneficial for high prediction accuracy) and necessary (beneficial for high robustness) deep features, and based on intervention employ minimax game to optimize. Experiments on four benchmark databases show that the proposed CPRL method outperforms many state-of-the-art adversarial defense methods and provides explicit model interpretation.

Abstract (translated)

尽管在建模视觉感知方面取得了巨大的成功,但基于深度神经网络的图像质量评估(IQA)仍然不可靠,因为在实际应用中容易受到对抗扰动的影响,并且具有难以解释的黑盒结构。在本文中,我们提出了一种通过Causal Perception启发下的表示学习(CPRL)构建可靠IQA模型的方法,以及一种IQA模型得分反射攻击方法。具体来说,我们假设每个图像由Causal Perception表示(CPR)和非对称感知表示(N-CPR)组成。CPR作为主观质量标签的因果关系,对不可感知的主观扰动具有不变性。相反,N-CPR表现出与主观质量标签的伪相关性,随着对抗扰动的变化,可能会显著改变。为了从每个输入图像中提取CPR,我们基于通道的激活函数开发了一种软排名方法,以介导足够因果(提高预测准确性)和必要(提高稳健性)的深度特征,并且通过干预采用最小最大游戏进行优化。在四个基准数据库上的实验表明,与最先进的对抗防御方法相比,所提出的CPRL方法具有更好的性能,并提供了明确的模型解释。

URL

https://arxiv.org/abs/2404.19567

PDF

https://arxiv.org/pdf/2404.19567.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot