Paper Reading AI Learner

Beyond Performance: Quantifying and Mitigating Label Bias in LLMs

2024-05-04 19:53:03
Yuval Reif, Roy Schwartz

Abstract

Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplored. In this study, we evaluate different approaches to quantifying label bias in a model's predictions, conducting a comprehensive investigation across 279 classification tasks and ten LLMs. Our investigation reveals substantial label bias in models both before and after debiasing attempts, as well as highlights the importance of outcomes-based evaluation metrics, which were not previously used in this regard. We further propose a novel label bias calibration method tailored for few-shot prompting, which outperforms recent calibration approaches for both improving performance and mitigating label bias. Our results emphasize that label bias in the predictions of LLMs remains a barrier to their reliability.

Abstract (translated)

大语言模型(LLMs)通过利用包含指令的上下文提示或最小输入-输出示例展示了对于各种任务的显著适应性。然而,最近的工作表明,它们还表现出了标签偏见——对于预测某些答案的偏好,而不是其他答案的预测。然而,在可信赖度和规模上检测和衡量这种偏见仍然是一个相对未探索的问题。在这项研究中,我们评估了在模型预测中量化标签偏见的不同方法,对279个分类任务和10个LLM进行了全面的调查。我们的调查揭示了模型在Debiasing尝试前和之后的标签偏见,并强调了基于结果的评估指标之前在这一点上没有使用的重要性。我们进一步提出了一个针对少样本提示的新型标签偏见校准方法,该方法在提高性能和减轻标签偏见方面优于最近的方法。我们的结果强调了LLM预测中标签偏见仍然是一个对其可靠性的障碍。

URL

https://arxiv.org/abs/2405.02743

PDF

https://arxiv.org/pdf/2405.02743.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot