Misinformation detection in Luganda-English code-mixed social media text

2021-03-31 21:12:29

Peter Nabende, David Kabiito, Claire Babirye, Hewitt Tusiime, Joyce Nakatumba-Nabende

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

The increasing occurrence, forms, and negative effects of misinformation on social media platforms has necessitated more misinformation detection tools. Currently, work is being done addressing COVID-19 misinformation however, there are no misinformation detection tools for any of the 40 distinct indigenous Ugandan languages. This paper addresses this gap by presenting basic language resources and a misinformation detection data set based on code-mixed Luganda-English messages sourced from the Facebook and Twitter social media platforms. Several machine learning methods are applied on the misinformation detection data set to develop classification models for detecting whether a code-mixed Luganda-English message contains misinformation or not. A 10-fold cross validation evaluation of the classification methods in an experimental misinformation detection task shows that a Discriminative Multinomial Naive Bayes (DMNB) method achieves the highest accuracy and F-measure of 78.19% and 77.90% respectively. Also, Support Vector Machine and Bagging ensemble classification models achieve comparable results. These results are promising since the machine learning models are based on n-gram features from only the misinformation detection dataset.

Abstract (translated)

URL

https://arxiv.org/abs/2104.00124

PDF

https://arxiv.org/pdf/2104.00124.pdf