Paper Reading AI Learner

DeAR: Debiasing Vision-Language Models with Additive Residuals

2023-03-18 14:57:43
Ashish Seth, Mayur Hemani, Chirag Agarwal

Abstract

Large pre-trained vision-language models (VLMs) reduce the time for developing predictive models for various vision-grounded language downstream tasks by providing rich, adaptable image and text representations. However, these models suffer from societal biases owing to the skewed distribution of various identity groups in the training data. These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications. In this work, we present DeAR (Debiasing with Additive Residuals), a novel debiasing method that learns additive residual image representations to offset the original representations, ensuring fair output representations. In doing so, it reduces the ability of the representations to distinguish between the different identity groups. Further, we observe that the current fairness tests are performed on limited face image datasets that fail to indicate why a specific text concept should/should not apply to them. To bridge this gap and better evaluate DeAR, we introduce the Protected Attribute Tag Association (PATA) dataset - a new context-based bias benchmarking dataset for evaluating the fairness of large pre-trained VLMs. Additionally, PATA provides visual context for a diverse human population in different scenarios with both positive and negative connotations. Experimental results for fairness and zero-shot performance preservation using multiple datasets demonstrate the efficacy of our framework.

Abstract (translated)

大型预训练的视觉语言模型(VLMs)通过提供丰富的、适应性的图像和文本表示,减少了开发针对各种视觉主导的后续语言任务的预测模型的时间。然而,这些模型由于训练数据中各种身份团体分布的偏斜而遭受社会偏见。这些偏见表现为特定文本概念和不同身份团体的人的图像表示之间的偏斜相似性,因此,这些模型在现实世界的重要应用中的局限性。在这项工作中,我们提出了DeAR(添加附加余量的清洁方法),这是一种新的清洁方法,学习添加附加余量的图像表示来抵消原始表示,确保公平的输出表示。通过这样做,它减少了表示的能力,以区分不同身份团体。此外,我们观察到,当前的公平测试是在有限面部图像数据集上进行的,无法说明为什么特定文本概念应该/不应该适用于它们。为了弥补这一差距并更好地评估DeAR,我们介绍了保护属性标签Association(PATA)数据集,这是一个基于上下文的偏见基准数据集,用于评估大型预训练VLMs的公平性。此外,PATA为不同身份团体的多样化的人类群体在不同情境下提供了视觉上下文。使用多个数据集的实验结果,公平性和零次机会性能保护证明了我们的框架的有效性。

URL

https://arxiv.org/abs/2303.10431

PDF

https://arxiv.org/pdf/2303.10431.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot