Paper Reading AI Learner

Gender Inclusivity Fairness Index : A Multilevel Framework for Evaluating Gender Diversity in Large Language Models

2025-06-18 15:43:16
Zhengyang Shan, Emily Ruth Diana, Jiawei Zhou

Abstract

We present a comprehensive evaluation of gender fairness in large language models (LLMs), focusing on their ability to handle both binary and non-binary genders. While previous studies primarily focus on binary gender distinctions, we introduce the Gender Inclusivity Fairness Index (GIFI), a novel and comprehensive metric that quantifies the diverse gender inclusivity of LLMs. GIFI consists of a wide range of evaluations at different levels, from simply probing the model with respect to provided gender pronouns to testing various aspects of model generation and cognitive behaviors under different gender assumptions, revealing biases associated with varying gender identifiers. We conduct extensive evaluations with GIFI on 22 prominent open-source and proprietary LLMs of varying sizes and capabilities, discovering significant variations in LLMs' gender inclusivity. Our study highlights the importance of improving LLMs' inclusivity, providing a critical benchmark for future advancements in gender fairness in generative models.

Abstract (translated)

我们提出了一种关于大型语言模型(LLM)性别公平性的全面评估,重点考察它们处理二元和非二元性别的能力。尽管以往的研究主要集中在二元性别区分上,我们引入了性别包容性公平指数(GIFI),这是一个新颖且全面的度量标准,用于量化LLM在不同性别标识下的多样性与包容性。GIFI涵盖了从简单地用提供的性别代词来探测模型到测试模型在不同性别假设下生成和认知行为的各种方面的广泛评估,揭示了与各种性别标识相关的偏见。我们在22个不同的开源和专有LLM上进行了广泛的GIFI评估,这些模型的大小和能力各异,发现它们在性别包容性方面存在显著差异。我们的研究强调了提高LLM包容性的必要性,并为未来生成模型中性别公平性的进展提供了重要的基准。

URL

https://arxiv.org/abs/2506.15568

PDF

https://arxiv.org/pdf/2506.15568.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot