Abstract
Gender bias exists in natural language datasets which neural language models tend to learn, resulting in biased text generation. In this research, we propose a debiasing approach based on the loss function modification. We introduce a new term to the loss function which attempts to equalize the probabilities of male and female words in the output. Using an array of bias evaluation metrics, we provide empirical evidence that our approach successfully mitigates gender bias in language models without increasing perplexity. In comparison to existing debiasing strategies, data augmentation, and word embedding debiasing, our method performs better in several aspects, especially in reducing gender bias in occupation words. Finally, we introduce a combination of data augmentation and our approach, and show that it outperforms existing strategies in all bias evaluation metrics.
Abstract (translated)
神经语言模型倾向于学习的自然语言数据集中存在性别偏见,导致文本生成的偏差。在本研究中,我们提出了一种基于损失函数修正的借记方法。我们引入了一个新的损失函数,它试图平衡输出中男女词的概率。使用一系列偏差评估指标,我们提供了经验证据,证明我们的方法能够在不增加困惑的情况下成功地缓解语言模型中的性别偏差。与现有的借记策略、数据扩充和嵌入词汇的借记相比,我们的方法在多个方面都表现得更好,尤其是在减少职业词汇中的性别偏见方面。最后,我们介绍了数据增强和我们的方法的组合,并表明它在所有偏差评估指标方面都优于现有的策略。
URL
https://arxiv.org/abs/1905.12801