Paper Reading AI Learner

White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency

2024-04-16 12:27:54
Yixin Wan, Kai-Wei Chang

Abstract

Social biases can manifest in language agency. For instance, White individuals and men are often described as "agentic" and achievement-oriented, whereas Black individuals and women are frequently described as "communal" and as assisting roles. This study establishes agency as an important aspect of studying social biases in both human-written and Large Language Model (LLM)-generated texts. To accurately measure "language agency" at sentence level, we propose a Language Agency Classification dataset to train reliable agency classifiers. We then use an agency classifier to reveal notable language agency biases in 6 datasets of human- or LLM-written texts, including biographies, professor reviews, and reference letters. While most prior NLP research on agency biases focused on single dimensions, we comprehensively explore language agency biases in gender, race, and intersectional identities. We observe that (1) language agency biases in human-written texts align with real-world social observations; (2) LLM-generated texts demonstrate remarkably higher levels of language agency bias than human-written texts; and (3) critical biases in language agency target people of minority groups--for instance, languages used to describe Black females exhibit the lowest level of agency across datasets. Our findings reveal intricate social biases in human- and LLM-written texts through the lens of language agency, warning against using LLM generations in social contexts without scrutiny.

Abstract (translated)

社会偏见可以在语言代理机构中表现出来。例如,白人和男性通常被描述为“主动的”和“有成就的”,而黑人信息和女性则常常被描述为“共同的”和“辅助性的”。这项研究建立了代理作为研究社会偏见在人类和大型语言模型(LLM)生成文本中的重要方面。为了准确测量句子级的“语言代理”,我们提出了一个语言代理分类数据集,用于训练可靠的代理分类器。然后,我们使用代理分类器在6个人类或LLM生成的文本数据集中揭示显著的语言代理偏见,包括传记、教授评价和推荐信。虽然之前的大多数自然语言处理研究对社会偏见集中在单一维度上,但我们对语言代理偏见进行全面探讨,涉及性别、种族和交叉身份。我们观察到,(1)人类生成的文本中的语言代理偏见与现实世界的社会观察相一致;(2)LLM生成的文本表现出显著的语言代理偏见,而人类生成的文本则不然;(3)语言代理偏见针对少数民族群体,例如用来说明黑人女性的语言在数据集中的代理水平最低。通过语言代理视角,我们的研究揭示了人类和LLM生成的文本中的复杂社会偏见,警告不要在社交环境中使用未经审视的LLM生成。

URL

https://arxiv.org/abs/2404.10508

PDF

https://arxiv.org/pdf/2404.10508.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot