Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

Abstract
Abstract (translated)
URL
PDF

Abstract

Content moderation faces a challenging task as social media's ability to spread hate speech contrasts with its role in promoting global connectivity. With rapidly evolving slang and hate speech, the adaptability of conventional deep learning to the fluid landscape of online dialogue remains limited. In response, causality inspired disentanglement has shown promise by segregating platform specific peculiarities from universal hate indicators. However, its dependency on available ground truth target labels for discerning these nuances faces practical hurdles with the incessant evolution of platforms and the mutable nature of hate speech. Using confidence based reweighting and contrastive regularization, this study presents HATE WATCH, a novel framework of weakly supervised causal disentanglement that circumvents the need for explicit target labeling and effectively disentangles input features into invariant representations of hate. Empirical validation across platforms two with target labels and two without positions HATE WATCH as a novel method in cross platform hate speech detection with superior performance. HATE WATCH advances scalable content moderation techniques towards developing safer online communities.

Abstract (translated)

内容审查面临着一个具有挑战性的任务，因为社交媒体传播仇恨言论的能力与促进全球连通性的作用相矛盾。随着迅速变化的俚语和仇恨言论，传统深度学习对在线对话灵活领域的适应性仍然有限。为了应对这一挑战，因果性启发下的解耦方法已经表现出通过隔离平台特定奇异特点与通用仇恨指标的 fluid 场景的潜力。然而，其对可用目标标签进行判断的依赖性，在平台不断演进和仇恨言论多变性的情况下，面临着实际障碍。通过基于信心的重新加权和平衡对比 regularization，本研究提出了 HATE WATCH，一种新颖的弱监督因果解码框架，它绕过了明确的目标标签的需要，有效将输入特征解耦为不变的仇恨表示。在两个带有目标标签的平台和两个没有位置的平台进行实证验证后，HATE WATCH 作为跨平台仇恨言论检测的一种新颖方法，具有卓越的性能。HATE WATCH 为实现更安全的在线社区提供了可扩展的审查方法。

URL

https://arxiv.org/abs/2404.11036

PDF

https://arxiv.org/pdf/2404.11036.pdf

Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement

Abstract

Abstract (translated)

URL

PDF Copy

PDF