Abstract
To address the limitations of current hate speech detection models, we introduce \textsf{SGHateCheck}, a novel framework designed for the linguistic and cultural context of Singapore and Southeast Asia. It extends the functional testing approach of HateCheck and MHC, employing large language models for translation and paraphrasing into Singapore's main languages, and refining these with native annotators. \textsf{SGHateCheck} reveals critical flaws in state-of-the-art models, highlighting their inadequacy in sensitive content moderation. This work aims to foster the development of more effective hate speech detection tools for diverse linguistic environments, particularly for Singapore and Southeast Asia contexts.
Abstract (translated)
为解决当前仇恨言论检测模型的局限性,我们引入了 \textsf{SGHateCheck},一个针对新加坡和东南亚文化语境的新型框架。它扩展了 hateCheck 和 MHC 的功能测试方法,使用大量语言模型进行翻译和的同义转述,并使用本土注释者对其进行细化。\textsf{SGHateCheck}揭示了现有模型的关键缺陷,突出了它们在敏感内容审查方面的不足。这项工作旨在为多样语言环境下的仇恨言论检测工具的发展培养,特别是为新加坡和东南亚语境。
URL
https://arxiv.org/abs/2405.01842