Abstract
Recently, Deep Learning (DL) approaches have been applied to solve the Sentiment Classification (SC) problem, which is a core task in reviews mining or Sentiment Analysis (SA). The performances of these approaches are affected by different factors. This paper addresses these factors and classifies them into three categories: data preparation based factors, feature representation based factors and the classification techniques based factors. The paper is a comprehensive literature-based survey that compares the performance of more than 100 DL-based SC approaches by using 21 public datasets of reviews given by customers within three specific application domains (products, movies and restaurants). These 21 datasets have different characteristics (balanced/imbalanced, size, etc.) to give a global vision for our study. The comparison explains how the proposed factors quantitatively affect the performance of the studied DL-based SC approaches.
Abstract (translated)
近年来,深度学习(DL)方法已应用于解决情感分类(SC)问题,这是评论挖掘或情感分析(SA)的核心任务。这些方法的表现受到不同因素的影响。本文对这些因素进行了分类,包括基于数据的因素、基于特征表示的因素以及基于分类技术的因素。本文是对基于文献的调查,通过使用由客户提供的21个公共评论数据集,对超过100个基于DL的SC方法进行了性能比较,这些数据集在产品、电影和餐厅等三个特定的应用领域(产品、电影和餐厅)中。这些21个数据集具有不同的特点(平衡/不平衡,大小等),为我们研究提供了全面的视角。比较解释了所提出的因素如何定量地影响研究的DL基于SC方法的性能。
URL
https://arxiv.org/abs/2312.17253