Paper Reading AI Learner

Causality Guided Representation Learning for Cross-Style Hate Speech Detection

2025-10-09 02:41:37
Chengshuai Zhao, Shu Wan, Paras Sheth, Karan Patwa, K. Sel\c{c}uk Candan, Huan Liu

Abstract

The proliferation of online hate speech poses a significant threat to the harmony of the web. While explicit hate is easily recognized through overt slurs, implicit hate speech is often conveyed through sarcasm, irony, stereotypes, or coded language -- making it harder to detect. Existing hate speech detection models, which predominantly rely on surface-level linguistic cues, fail to generalize effectively across diverse stylistic variations. Moreover, hate speech spread on different platforms often targets distinct groups and adopts unique styles, potentially inducing spurious correlations between them and labels, further challenging current detection approaches. Motivated by these observations, we hypothesize that the generation of hate speech can be modeled as a causal graph involving key factors: contextual environment, creator motivation, target, and style. Guided by this graph, we propose CADET, a causal representation learning framework that disentangles hate speech into interpretable latent factors and then controls confounders, thereby isolating genuine hate intent from superficial linguistic cues. Furthermore, CADET allows counterfactual reasoning by intervening on style within the latent space, naturally guiding the model to robustly identify hate speech in varying forms. CADET demonstrates superior performance in comprehensive experiments, highlighting the potential of causal priors in advancing generalizable hate speech detection.

Abstract (translated)

网络上的仇恨言论泛滥成灾,对互联网的和谐构成了重大威胁。虽然明确的仇恨言论可以通过明显的侮辱性语言轻易识别出来,但隐含的仇恨言论则常常通过讽刺、反语、刻板印象或暗码语言来传达——这使得它们更难被发现。现有的仇恨言论检测模型主要依赖于表面的语言线索,因此在面对不同风格的变化时难以泛化有效。此外,不同的平台上的仇恨言论往往针对特定群体,并采用独特风格,可能会产生虚假的相关性,从而进一步挑战目前的检测方法。鉴于这些观察结果,我们假设仇恨言论的生成可以被建模为一个涉及关键因素(如上下文环境、创作者动机、目标和风格)的因果图。基于这一模型,我们提出了CADET框架,这是一个因果表示学习框架,它能够将仇恨言论分解成可解释的潜在因素,并控制混杂变量,从而把真正的仇恨意图与表面的语言线索区分开来。此外,CADET允许在潜在空间中通过改变风格来进行反事实推理,自然地引导模型识别不同形式下的仇恨言论。在全面实验中,CADET展示了优越的表现力,强调了因果先验知识在推进泛化性仇恨言论检测中的潜力。

URL

https://arxiv.org/abs/2510.07707

PDF

https://arxiv.org/pdf/2510.07707.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot