Abstract
To mitigate gender bias in contextualized language models, different intrinsic mitigation strategies have been proposed, alongside many bias metrics. Considering that the end use of these language models is for downstream tasks like text classification, it is important to understand how these intrinsic bias mitigation strategies actually translate to fairness in downstream tasks and the extent of this. In this work, we design a probe to investigate the effects that some of the major intrinsic gender bias mitigation strategies have on downstream text classification tasks. We discover that instead of resolving gender bias, intrinsic mitigation techniques and metrics are able to hide it in such a way that significant gender information is retained in the embeddings. Furthermore, we show that each mitigation technique is able to hide the bias from some of the intrinsic bias measures but not all, and each intrinsic bias measure can be fooled by some mitigation techniques, but not all. We confirm experimentally, that none of the intrinsic mitigation techniques used without any other fairness intervention is able to consistently impact extrinsic bias. We recommend that intrinsic bias mitigation techniques should be combined with other fairness interventions for downstream tasks.
Abstract (translated)
为了减轻上下文化语言模型中的性别偏见,提出了多种固有的减轻策略,并同时提出了许多偏见度量。考虑到这些语言模型最终用途是文本分类等下游任务,理解这些固有的偏见减轻策略如何实际转化为下游任务的公平性以及其程度是至关重要的。在这个研究中,我们设计了一个测试集来研究一些主要固有的性别偏见减轻策略对下游文本分类任务的影响。我们发现,而不是解决性别偏见,固有的减轻技术和度量能够以某种方式掩盖它,从而使嵌入中保留重要的性别信息。此外,我们表明,每个减轻技术能够从某些固有的偏见度量中掩盖偏见,但不是所有的,每个固有的偏见度量能够被某些减轻技术欺骗,但不是所有的。我们实验确认,使用任何其他公平干预都不会使固有的减轻技术能够 consistently 影响外部偏见。我们建议,固有的偏见减轻技术应该与其他公平干预一起用于下游任务。
URL
https://arxiv.org/abs/2301.12855