Abstract
Detecting stereotypes and biases in Large Language Models (LLMs) is crucial for enhancing fairness and reducing adverse impacts on individuals or groups when these models are applied. Traditional methods, which rely on embedding spaces or are based on probability metrics, fall short in revealing the nuanced and implicit biases present in various contexts. To address this challenge, we propose the FairMonitor framework and adopt a static-dynamic detection method for a comprehensive evaluation of stereotypes and biases in LLMs. The static component consists of a direct inquiry test, an implicit association test, and an unknown situation test, including 10,262 open-ended questions with 9 sensitive factors and 26 educational scenarios. And it is effective for evaluating both explicit and implicit biases. Moreover, we utilize the multi-agent system to construst the dynamic scenarios for detecting subtle biases in more complex and realistic setting. This component detects the biases based on the interaction behaviors of LLMs across 600 varied educational scenarios. The experimental results show that the cooperation of static and dynamic methods can detect more stereotypes and biased in LLMs.
Abstract (translated)
发现大型语言模型(LLMs)中的刻板印象和偏见对于在应用这些模型时增强公平性和减少对个人或团体的不良影响至关重要。传统方法,包括基于嵌入空间或概率度量的方法,在揭示各种背景下存在的微妙的和隐含的偏见方面存在不足。为解决这个问题,我们提出了FairMonitor框架,并采用一种全面评估LLMs中的刻板印象和偏见的静态-动态检测方法。静态部分包括直接调查测试、隐含联想测试和未知情境测试,包括10,262个开放性问题,其中9个敏感因素和26个教育场景。这种方法对于评估 explicit 和 implicit biases都有效。此外,我们还利用多智能体系统来构建动态场景,以更复杂和现实的环境中检测微妙的偏见。该组件基于LLMs之间600个不同教育场景的交互行为来检测偏见。实验结果表明,静态和动态方法的协同作用可以检测到更多刻板印象和偏见在LLMs中。
URL
https://arxiv.org/abs/2405.03098