Abstract
Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The dataset can be used for a range of NLP tasks, including classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, and provide baselines using common NLP algorithms in the context of misogyny detection and mitigation. We hope this work will promote AI for social good in NLP for bias detection, explanation, and removal.
Abstract (translated)
使用新颖的数据集开发方法,Biasly 数据集独特地捕捉了性别歧视中的细微和微妙之处。该数据集与跨学科的专家和注释者合作编写,其中包括对北美电影中流行表达式的不准翻译的注释。该数据集可用于各种自然语言处理任务,包括分类、严重度评分回归和重写文本生成。在本文中,我们讨论了所采用的方法,分析了获得的注释,并在性别歧视检测和减轻方面提供了使用常见自然语言处理算法的基本模板。我们希望这项工作将促进在 NLP 中实现人工智能的社会价值,包括偏差检测、解释和消除。
URL
https://arxiv.org/abs/2311.09443