Abstract
One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: "Laptops", "Restaurants", and "MTSC" (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.
Abstract (translated)
自然语言理解的挑战之一是处理句子的主观性,这些句子可能表达意见和情感,增加了复杂性和细微差别。情感分析是一个旨在提取和分析这些主观元素的领域,可以应用于不同的粒度级别,如文档、段落、句子或方面。基于方面的情感分析是一个研究得很好的主题,有很多可用的数据集和模型。然而,对于基于方面情感分析来说,很难给出一个明确的定义什么是句子很难。在本文中,我们通过研究三个数据集:“笔记本电脑”、“餐厅”和“MTSC”(多目标情感分类)以及这三个数据集的合并版本,探讨了这个问题。我们研究了领域多样性和句法多样性的影响。我们使用分类器来识别最困难的句子并分析它们的特征。我们使用两种定义句子难度的方法。第一种是二元的,将句子分类为困难,如果分类器不能正确预测情感极性。第二种是基于五个最佳表现分类器正确预测情感极性的六级水平。我们还定义了9个语言特征,这些特征的组合旨在在句子级别上估计难度。
URL
https://arxiv.org/abs/2402.03163