Abstract
Existing Machine Learning techniques yield close to human performance on text-based classification tasks. However, the presence of multi-modal noise in chat data such as emoticons, slang, spelling mistakes, code-mixed data, etc. makes existing deep-learning solutions perform poorly. The inability of deep-learning systems to robustly capture these covariates puts a cap on their performance. We propose NELEC: Neural and Lexical Combiner, a system which elegantly combines textual and deep-learning based methods for sentiment classification. We evaluate our system as part of the third task of 'Contextual Emotion Detection in Text' as part of SemEval-2019. Our system performs significantly better than the baseline, as well as our deep-learning model benchmarks. It achieved a micro-averaged F1 score of 0.7765, ranking 3rd on the test-set leader-board. Our code is available at https://github.com/iamgroot42/nelec
Abstract (translated)
现有的机器学习技术在基于文本的分类任务中产生了接近人类的性能。然而,在诸如表情符号、俚语、拼写错误、代码混合数据等聊天数据中存在多模态噪声,使得现有的深度学习解决方案表现不佳。深度学习系统无法可靠地捕获这些协变量,这就限制了它们的性能。我们提出了神经和词汇结合器,一个优雅地结合了基于文本和深度学习的情绪分类方法的系统。作为Semeval-2019的第三项任务“文本中的情境情感检测”的一部分,我们评估了我们的系统。我们的系统性能明显优于基线以及我们的深度学习模型基准。它获得了0.7765的微观平均F1分数,在测试集领导委员会中排名第三。我们的代码在https://github.com/iamgroot42/nelec上提供
URL
https://arxiv.org/abs/1904.03223