Evaluation of Deep Learning Models for Hostility Detection in Hindi Text

Abstract
Abstract (translated)
URL
PDF

Abstract

The social media platform is a convenient medium to express personal thoughts and share useful information. It is fast, concise, and has the ability to reach millions. It is an effective place to archive thoughts, share artistic content, receive feedback, promote products, etc. Despite having numerous advantages these platforms have given a boost to hostile posts. Hate speech and derogatory remarks are being posted for personal satisfaction or political gain. The hostile posts can have a bullying effect rendering the entire platform experience hostile. Therefore detection of hostile posts is important to maintain social media hygiene. The problem is more pronounced languages like Hindi which are low in resources. In this work, we present approaches for hostile text detection in the Hindi language. The proposed approaches are evaluated on the Constraint@AAAI 2021 Hindi hostility detection dataset. The dataset consists of hostile and non-hostile texts collected from social media platforms. The hostile posts are further segregated into overlapping classes of fake, offensive, hate, and defamation. We evaluate a host of deep learning approaches based on CNN and LSTM for this multi-label classification problem. The pre-trained Hindi fast text word embeddings by IndicNLP and Facebook are used in conjunction with these models to evaluate their effectiveness. We show that the multi-CNN model when combined with IndicNLP FastText word embedding gives the best results.

Abstract (translated)

URL

https://arxiv.org/abs/2101.04144

PDF

https://arxiv.org/pdf/2101.04144.pdf