Paper Reading AI Learner

Generative AI for Hate Speech Detection: Evaluation and Findings

2023-11-16 16:09:43
Sagi Pendzel, Tomer Wullach, Amir Adler, Einat Minkov


Automatic hate speech detection using deep neural models is hampered by the scarcity of labeled datasets, leading to poor generalization. To mitigate this problem, generative AI has been utilized to generate large amounts of synthetic hate speech sequences from available labeled examples, leveraging the generated data in finetuning large pre-trained language models (LLMs). In this chapter, we provide a review of relevant methods, experimental setups and evaluation of this approach. In addition to general LLMs, such as BERT, RoBERTa and ALBERT, we apply and evaluate the impact of train set augmentation with generated data using LLMs that have been already adapted for hate detection, including RoBERTa-Toxicity, HateBERT, HateXplain, ToxDect, and ToxiGen. An empirical study corroborates our previous findings, showing that this approach improves hate speech generalization, boosting recall performance across data distributions. In addition, we explore and compare the performance of the finetuned LLMs with zero-shot hate detection using a GPT-3.5 model. Our results demonstrate that while better generalization is achieved using the GPT-3.5 model, it achieves mediocre recall and low precision on most datasets. It is an open question whether the sensitivity of models such as GPT-3.5, and onward, can be improved using similar techniques of text generation.

Abstract (translated)

使用深度神经网络模型进行自动仇恨言论检测存在标注数据集的稀缺性,导致泛化能力差。为解决这个问题,生成式人工智能(GAs)被用于从现有标注示例中生成大量合成仇恨言论序列,并利用生成的数据在微调大预训练语言模型(LLMs)时加强训练集 augment。在本章中,我们回顾了相关方法、实验设置以及这种方法的评估。除了通用的 LLMs,如 BERT、RoBERTa 和 ALBERT,我们还使用已经适应仇恨检测的 LLMs 进行了训练集增强和评估,包括 RoBERTa-Toxicity、HateBERT、HateXplain、ToxDect 和 ToxiGen。 一项实证研究证实了我们在之前的观察结果,表明这种方法提高了仇恨言论的泛化能力,提高了数据分布的召回率。此外,我们探讨并比较了使用 GPT-3.5 模型进行微调的 LLMs 与零散仇恨检测模型的性能。我们的结果表明,虽然使用 GPT-3.5 模型可以实现更好的泛化,但它在大多数数据集上的召回率和精度都较低。一个有趣的问题是对 GPT-3.5 模型等模型的敏感性是否可以通过类似的技术进行改进。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot