Paper Reading AI Learner

Are Models Biased on Text without Gender-related Language?

2024-05-01 15:51:15
Catarina G Belém, Preethi Seshadri, Yasaman Razeghi, Sameer Singh

Abstract

Gender bias research has been pivotal in revealing undesirable behaviors in large language models, exposing serious gender stereotypes associated with occupations, and emotions. A key observation in prior work is that models reinforce stereotypes as a consequence of the gendered correlations that are present in the training data. In this paper, we focus on bias where the effect from training data is unclear, and instead address the question: Do language models still exhibit gender bias in non-stereotypical settings? To do so, we introduce UnStereoEval (USE), a novel framework tailored for investigating gender bias in stereotype-free scenarios. USE defines a sentence-level score based on pretraining data statistics to determine if the sentence contain minimal word-gender associations. To systematically benchmark the fairness of popular language models in stereotype-free scenarios, we utilize USE to automatically generate benchmarks without any gender-related language. By leveraging USE's sentence-level score, we also repurpose prior gender bias benchmarks (Winobias and Winogender) for non-stereotypical evaluation. Surprisingly, we find low fairness across all 28 tested models. Concretely, models demonstrate fair behavior in only 9%-41% of stereotype-free sentences, suggesting that bias does not solely stem from the presence of gender-related words. These results raise important questions about where underlying model biases come from and highlight the need for more systematic and comprehensive bias evaluation. We release the full dataset and code at this https URL.

Abstract (translated)

性别偏见研究在揭示大型语言模型中的不良行为、职业 associated 的严重性别刻板印象和情感方面具有关键作用。以前的工作中发现,模型会强化刻板印象,因为训练数据中存在的性别相关关系。在本文中,我们关注训练数据效果不明确的情况,并回答这个问题:语言模型在非刻板印象的场景中是否仍然存在性别偏见?为了回答这个问题,我们引入了UnStereoEval(USE),一种专为研究性别偏见在非刻板印象场景中的框架。USE根据预训练数据统计数据定义了一个句子级得分,以确定句子是否包含最小词-性别关联。为了系统地评估流行语言模型在非刻板印象场景中的公平性,我们利用USE自动生成没有任何性别相关语言的基准。通过利用USE的句子级得分,我们还重新利用了以前的性别偏见基准(Winobias和Winogender)进行非刻板印象评估。令人惊讶的是,我们发现所有28个测试模型在非刻板印象场景中都存在低公平性。具体来说,模型在非刻板印象句子中的公正行为仅占9%-41%。这些结果引发了一些重要问题,即潜在模型偏见来自何处,以及需要更系统化和全面的偏见评估。我们在这个链接上发布了完整的数据集和代码:https://www.aclweb.org/anthology/N18-2172

URL

https://arxiv.org/abs/2405.00588

PDF

https://arxiv.org/pdf/2405.00588.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot