Abstract
Communal violence in online forums has become extremely prevalent in South Asia, where many communities of different cultures coexist and share resources. These societies exhibit a phenomenon characterized by strong bonds within their own groups and animosity towards others, leading to conflicts that frequently escalate into violent confrontations. To address this issue, we have developed the first comprehensive framework for the automatic detection of communal violence markers in online Bangla content accompanying the largest collection (13K raw sentences) of social media interactions that fall under the definition of four major violence class and their 16 coarse expressions. Our workflow introduces a 7-step expert annotation process incorporating insights from social scientists, linguists, and psychologists. By presenting data statistics and benchmarking performance using this dataset, we have determined that, aside from the category of Non-communal violence, Religio-communal violence is particularly pervasive in Bangla text. Moreover, we have substantiated the effectiveness of fine-tuning language models in identifying violent comments by conducting preliminary benchmarking on the state-of-the-art Bangla deep learning model.
Abstract (translated)
在南亚地区,许多不同的文化社区共存并共享资源,这导致了一种普遍存在的社区暴力现象。这些社会在自身群体中表现出强烈的纽带,对他人表现出敌意,导致经常升级为暴力冲突。为解决这个问题,我们开发了第一个全面在线论坛内容自动检测四大学术分类和社会媒体互动中出现的共同暴力标志的全面框架。我们的工作流程结合了社会科学家的见解、语言学家和心理学家的研究成果,采用了7步专家注释过程。通过展示数据统计和基准性能,我们确定,除非社区暴力外,孟加拉语文本中的宗教社区暴力尤为普遍。此外,我们还通过使用最先进的孟加拉语深度学习模型进行初步基准测试,证实了语言模型在识别暴力评论方面的有效性。
URL
https://arxiv.org/abs/2404.11752