Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre-trained models. We further propose two more LIFA variants that encourage the pre-trained models to either cooperate or compete with one another. Studying these variants sheds light on the success of LIFA by showing that sharing knowledge among the models is more beneficial for transfer learning. Moreover, we construct the AISIA-VN-Review-F dataset, the first large-scale Vietnamese sentiment classification database. We conduct extensive experiments on the AISIA-VN-Review-F and existing benchmarks to demonstrate the efficacy of LIFA compared to other techniques. To contribute to the Vietnamese NLP research, we publish our source code and datasets to the research community upon acceptance.
迁移学习在深度学习中发挥着至关重要的作用,它能够显著改善训练数据不足的目标领域的性能。我们的研究超越了通常使用的迁移学习单模型实践。我们专注于越南情绪分类任务,并提出了LIFA框架,一种从多个预训练模型学习统一嵌入的框架。我们还提出了两个LIFA变体,鼓励预训练模型相互合作或竞争。研究这些变体有助于阐明LIFA的成功,表明模型之间共享知识对于迁移学习更有益。此外,我们创建了AISIA-VN-Review-F数据集,它是越南情绪分类领域第一个大规模数据库。我们对AISIA-VN-Review-F和现有基准进行了广泛的实验,以证明LIFA相对于其他技术的有效性。为了向越南自然语言处理研究做出贡献,我们在接受后将我们的源代码和数据集公开给学术界。
https://arxiv.org/abs/2303.09115
Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.
情感分类是自然语言处理领域的一个基本任务,具有重要的学术和商业应用。其目的是自动预测包含一定程度的 opinions 和主观性的文本的情感程度,例如产品影评和推特。这确实非常困难,部分原因是不同文本域包含不同的单词和表达方式。此外,当文本写为非英语语言时,由于缺乏数据库和资源,难度会增加。因此,经常将该任务与跨域和跨语言技术应用于以提高结果。在本研究中,我们对使用大型产品评论数据库训练的分类系统的能力进行了研究。从七个拉丁美洲国家收集了从 MercadoLibre 网站上收集的评论,创建了一个大型且平衡的数据集。结果显示,尽管使用这些产品评论进行训练非常困难,但跨域 generalization 是可行的,可以通过预先训练和优化分类模型来改进。
https://arxiv.org/abs/2303.08985
Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.
长序列转换器的设计目的是通过语言模型提高更长文本的表示和下游文档级任务的性能。然而,对于长形式模型中 token-level 的预测质量,了解并不多。我们在无监督文档分类上下文中研究了这种架构的性能。我们发现,与 Longformer 语言模型组合时,标准软注意力方法的性能显著下降。我们提出了一个组成性的软注意力架构,应用 RoBERTa 语句级方法从 token-level 中提取合理的理由。我们发现,这种方法在情感分类数据集上显著优于 Longformer 驱动的基准模型,同时表现出显著更低的运行时间。
https://arxiv.org/abs/2303.07991
I propose a novel dual-attention model(DAM) for aspect-level sentiment classification. Many methods have been proposed, such as support vector machines for artificial design features, long short-term memory networks based on attention mechanisms, and graph neural networks based on dependency parsing. While these methods all have decent performance, I think they all miss one important piece of syntactic information: dependency labels. Based on this idea, this paper proposes a model using dependency labels for the attention mechanism to do this task. We evaluate the proposed approach on three datasets: laptop and restaurant are from SemEval 2014, and the last one is a twitter dataset. Experimental results show that the dual attention model has good performance on all three datasets.
我提出了一种新的多关注模型(DAM),用于 aspect-level 情感分类。已经提出了许多方法,如支持向量机用于人工设计特征、基于注意力机制的长期短期记忆网络,以及基于依赖解析的图神经网络。虽然这些方法都取得了不错的性能,但我觉得它们都缺少一个重要的语法信息:依赖标签。基于这个想法,本文提出了一种使用依赖标签的注意力机制模型来完成这个任务。我们评估了三种数据集:笔记本电脑和餐厅来自SemEval 2014,最后一个数据集是推特数据集。实验结果表明,双关注模型在所有三个数据集上取得了良好的性能。
https://arxiv.org/abs/2303.07689
As part of the recent research effort on quantum natural language processing (QNLP), variational quantum sentence classifiers (VQSCs) have been implemented and supported in lambeq / DisCoPy, based on the DisCoCat model of sentence meaning. We discuss in some detail VQSCs, including category theory, DisCoCat for modeling sentence as string diagram, and DisCoPy for encoding string diagram as parameterized quantum circuit. Many NLP tasks, however, require the handling of text consisting of multiple sentences, which is not supported in lambeq / DisCoPy. A good example is sentiment classification of customer feedback or product review. We discuss three potential approaches to variational quantum text classifiers (VQTCs), in line with VQSCs. The first is a weighted bag-of-sentences approach which treats text as a group of independent sentences with task-specific sentence weighting. The second is a coreference resolution approach which treats text as a consolidation of its member sentences with coreferences among them resolved. Both approaches are based on the DisCoCat model and should be implementable in lambeq / DisCoCat. The third approach, on the other hand, is based on the DisCoCirc model which considers both ordering of sentences and interaction of words in composing text meaning from word and sentence meanings. DisCoCirc makes fundamental modification of DisCoCat since a sentence in DisCoCirc updates meanings of words, whereas all meanings are static in DisCoCat. It is not clear if DisCoCirc can be implemented in lambeq / DisCoCat without breaking DisCoCat.
作为最近在量子自然语言处理(QNLP)研究中的一支努力,我们在 Lambeq / DisCoPy 中实现了 variational 量子句子分类器(VQSCs),基于句子意义的 DisCoCat 模型。我们详细讨论了 VQSCs,包括分类理论、将句子建模为字符串Diagram的 DisCoCat 以及将字符串Diagram编码为参数化的量子电路的 DisCoPy。然而,许多 NLP 任务需要处理包含多个句子的文本,这在 Lambeq / DisCoPy 中不支持。一个良好的例子是情感分类 customer 反馈或产品评论。我们讨论了三种与 VQSCs 相关的可能方法。第一种是加权句子包方法,将文本视为具有任务特定句子权重的独立句子群,第二种是共指关系解决方法,将文本视为其成员句子的整合,并解决它们之间的共指关系。这两种方法都基于 DisCoCat 模型,应该在 Lambeq / DisCoPy 中实现。第三种方法是基于 DisCocirc 模型,它考虑句子的排序和单词之间的交互,以从单词和句子意义构建文本意义。DisCocirc 对 DisCoCat 进行了基本修改,因为在一个 DisCocirc 句子中,单词的意义更新了,而在 DisCoCat 中,所有意义都是静态的。目前还不清楚,是否可以在 Lambeq / DisCoPy 中实现 DisCocirc,而不必破坏 DisCoCat。
https://arxiv.org/abs/2303.02469
Internet memes are characterised by the interspersing of text amongst visual elements. State-of-the-art multimodal meme classifiers do not account for the relative positions of these elements across the two modalities, despite the latent meaning associated with where text and visual elements are placed. Against two meme sentiment classification datasets, we systematically show performance gains from incorporating the spatial position of visual objects, faces, and text clusters extracted from memes. In addition, we also present facial embedding as an impactful enhancement to image representation in a multimodal meme classifier. Finally, we show that incorporating this spatial information allows our fully automated approaches to outperform their corresponding baselines that rely on additional human validation of OCR-extracted text.
互联网Meme的特点是将文本穿插于视觉元素之间。最先进的多模式Meme分类器并未考虑到这些元素在两种模式之间的相对位置,尽管它们在放置文本和视觉元素时的潜在含义相关。在面对两个Meme情感分类数据集时,我们系统地展示了从Meme中提取的视觉对象、人脸和文本簇的 spatial 位置对图像表示的影响。此外,我们还提出了面部嵌入作为多模式Meme分类器中图像表示的一种重要增强方法。最后,我们表明,包括这种空间信息使我们的完全自动化方法能够比依赖于人工验证OCR提取的文本的对应基线表现更好。
https://arxiv.org/abs/2303.01781
Gradient-based explanation methods play an important role in the field of interpreting complex deep neural networks for NLP models. However, the existing work has shown that the gradients of a model are unstable and easily manipulable, which impacts the model's reliability largely. According to our preliminary analyses, we also find the interpretability of gradient-based methods is limited for complex tasks, such as aspect-based sentiment classification (ABSC). In this paper, we propose an \textbf{I}nterpretation-\textbf{E}nhanced \textbf{G}radient-based framework for \textbf{A}BSC via a small number of explanation annotations, namely \texttt{IEGA}. Particularly, we first calculate the word-level saliency map based on gradients to measure the importance of the words in the sentence towards the given aspect. Then, we design a gradient correction module to enhance the model's attention on the correct parts (e.g., opinion words). Our model is model agnostic and task agnostic so that it can be integrated into the existing ABSC methods or other tasks. Comprehensive experimental results on four benchmark datasets show that our \texttt{IEGA} can improve not only the interpretability of the model but also the performance and robustness.
梯度解释方法在处理自然语言处理模型时扮演了重要的角色。然而,现有的研究已经表明,模型的梯度不稳定且容易操纵,这严重影响了模型的可靠性。根据我们的初步分析,我们还发现对于复杂的任务,如基于 aspect 的情感分类(ABSC),梯度解释方法的可解释性是有限的。在本文中,我们提出了一个基于梯度的优化框架,名为 ABSC 解释框架(IEGA),通过少量的解释标注,即 exttt{IEGA},来实现 ABSC 任务。特别地,我们首先通过梯度计算单词级别的局部响应图,以测量句子中单词对于给定 aspect 的重要性。然后,我们设计了一个梯度修正模块,以提高模型对正确部分(如意见词)的关注。我们的模型是无模型偏好和任务偏好的,因此它可以与现有的 ABSC 方法或其他任务集成。对四个基准数据集的全面实验结果表明,我们的 exttt{IEGA} 不仅可以提高模型的可解释性,还可以提高性能和鲁棒性。
https://arxiv.org/abs/2302.10479
Africa is home to over 2000 languages from over six language families and has the highest linguistic diversity among all continents. This includes 75 languages with at least one million speakers each. Yet, there is little NLP research conducted on African languages. Crucial in enabling such research is the availability of high-quality annotated datasets. In this paper, we introduce AfriSenti, which consists of 14 sentiment datasets of 110,000+ tweets in 14 African languages (Amharic, Algerian Arabic, Hausa, Igbo, Kinyarwanda, Moroccan Arabic, Mozambican Portuguese, Nigerian Pidgin, Oromo, Swahili, Tigrinya, Twi, Xitsonga, and Yorùbá) from four language families annotated by native speakers. The data is used in SemEval 2023 Task 12, the first Afro-centric SemEval shared task. We describe the data collection methodology, annotation process, and related challenges when curating each of the datasets. We conduct experiments with different sentiment classification baselines and discuss their usefulness. We hope AfriSenti enables new work on under-represented languages. The dataset is available at this https URL and can also be loaded as a huggingface datasets (this https URL).
非洲拥有超过6个语言家族的语言,拥有在所有大陆上最高的语言学多样性。这包括75个语言,每个都有至少100万人的母语人口。然而,关于非洲语言的研究却非常少。实现这种研究的关键在于拥有高质量的标注数据。在本文中,我们介绍了AfriSenti,它是由14个非洲语言中的110,000多条推特组成的标注数据集,这些推特是由母语人士标注的。这些数据集在SemEval 2023任务12中得到了使用,这是第一个以非洲为中心举行的SemEval共享任务。我们在每个数据集的编辑过程中描述了数据收集方法、标注过程和相关的挑战。我们使用了不同的情感分类基准数据集并进行实验,并讨论了它们的有用性。我们希望AfriSenti能够促进缺乏代表的语言的研究。该数据集可以在该httpsURL上获取,也可以作为拥抱脸数据集(该httpsURL)。
https://arxiv.org/abs/2302.08956
In this paper, we present InstructABSA, Aspect-Based Sentiment Analysis (ABSA) using instruction learning paradigm for all ABSA subtasks: Aspect Term Extraction (ATE), Aspect Term Sentiment Classification (ATSC), and Joint Task modeling. Our method introduces positive, negative, and neutral examples to each training sample, and instruction tunes the model (Tk-Instruct Base) for each ABSA subtask, yielding significant performance improvements. Experimental results on the Sem Eval 2014 dataset demonstrate that InstructABSA outperforms the previous state-of-the-art (SOTA) approaches on all three ABSA subtasks (ATE, ATSC, and Joint Task) by a significant margin, outperforming 7x larger models. In particular, InstructABSA surpasses the SOTA on the restaurant ATE subtask by 7.31% points and on the Laptop Joint Task by 8.63% points. Our results also suggest a strong generalization ability to unseen tasks across all three subtasks.
在本文中,我们提出了InstructABSA,一种基于 aspect 特征的情感分析(ABSA)方法,使用了指令学习范式对所有三个ABSA子任务(ate、aTSC 和联合任务建模): aspect 特征提取(ate)、 aspect 特征情感分类(aTSC)和联合任务建模。我们向每个训练样本引入了积极、消极和中性例子,并指令优化每个ABSA子任务模型(Tk-指令基础),带来了显著的性能改善。在SemEval 2014数据集的实验结果表明,InstructABSA在三个ABSA子任务(ate、aTSC 和联合任务)中比先前的先进技术方法表现得更好,比模型规模扩大了7倍。特别是,InstructABSA在餐厅 ate 子任务上超过SOTA方法7.31%,在笔记本电脑联合任务上超过8.63%。我们的结果还表明,可以在三个子任务中 unseen 任务的强大泛化能力。
https://arxiv.org/abs/2302.08624
This paper explores the integration of symbolic logic knowledge into deep neural networks for learning from noisy crowd labels. We introduce Logic-guided Learning from Noisy Crowd Labels (Logic-LNCL), an EM-alike iterative logic knowledge distillation framework that learns from both noisy labeled data and logic rules of interest. Unlike traditional EM methods, our framework contains a ``pseudo-E-step'' that distills from the logic rules a new type of learning target, which is then used in the ``pseudo-M-step'' for training the classifier. Extensive evaluations on two real-world datasets for text sentiment classification and named entity recognition demonstrate that the proposed framework improves the state-of-the-art and provides a new solution to learning from noisy crowd labels.
本论文探讨了将符号逻辑知识集成到深层神经网络中,以从嘈杂的群众标签中学习的方法。我们介绍了逻辑引导的从嘈杂群众标签中学习(逻辑-LNCL),这是一种类似于EM迭代逻辑知识蒸馏框架,从嘈杂的标签数据和感兴趣的逻辑规则中学习。与传统EM方法不同,我们的框架包含一个“伪-E步”从逻辑规则中蒸馏出一种新的学习目标,然后用于训练分类器。对两个真实的数据集进行广泛的评估,用于文本情感分类和命名实体识别,表明该框架改进了现有技术,并为从嘈杂的群众标签中学习提供了新的解决方案。
https://arxiv.org/abs/2302.06337
We propose two methods to make unsupervised domain adaptation (UDA) more parameter efficient using adapters, small bottleneck layers interspersed with every layer of the large-scale pre-trained language model (PLM). The first method deconstructs UDA into a two-step process: first by adding a domain adapter to learn domain-invariant information and then by adding a task adapter that uses domain-invariant information to learn task representations in the source domain. The second method jointly learns a supervised classifier while reducing the divergence measure. Compared to strong baselines, our simple methods perform well in natural language inference (MNLI) and the cross-domain sentiment classification task. We even outperform unsupervised domain adaptation methods such as DANN and DSN in sentiment classification, and we are within 0.85% F1 for natural language inference task, by fine-tuning only a fraction of the full model parameters. We release our code at this https URL
我们提出了两种方法,通过 adapter 插入小型瓶颈层,在每个大型预训练语言模型(PLM)的层之间插入,实现无监督领域适应(UDA)的更参数高效的方法。第一种方法将 UDA 分解为两个步骤:首先添加一个领域适应器来学习领域不变的信息,然后添加一个任务适应器,使用领域不变的信息学习源领域的任务表示。第二种方法同时学习一个监督分类器,并减少交叉熵度量。与强基准相比,我们的简单方法在自然语言推理(MNLI)和跨域情感分类任务中表现良好。我们甚至在情感分类任务中优于 DANN 和 DSN等无监督领域适应方法,而且通过微调仅一部分模型参数,我们在自然语言推理任务中达到了 0.85% F1 的精度。我们将我们的代码发布在这个 https URL 上。
https://arxiv.org/abs/2302.03194
A recent line of work in NLP focuses on the (dis)ability of models to generalise compositionally for artificial languages. However, when considering natural language tasks, the data involved is not strictly, or locally, compositional. Quantifying the compositionality of data is a challenging task, which has been investigated primarily for short utterances. We use recursive neural models (Tree-LSTMs) with bottlenecks that limit the transfer of information between nodes. We illustrate that comparing data's representations in models with and without the bottleneck can be used to produce a compositionality metric. The procedure is applied to the evaluation of arithmetic expressions using synthetic data, and sentiment classification using natural language data. We demonstrate that compression through a bottleneck impacts non-compositional examples disproportionately and then use the bottleneck compositionality metric (BCM) to distinguish compositional from non-compositional samples, yielding a compositionality ranking over a dataset.
最近的自然语言处理工作的重点是对模型对人工语言进行泛化的能力(是否存在)的探究。然而,当考虑自然语言任务时,所涉及的数据并不严格或当地是Compositional的。量化数据Compositionality的程度是一个挑战性的任务,最初是为了短语句而研究的。我们使用递归神经网络模型(Tree-LSTMs),其中存在瓶颈,限制信息在节点之间的传输。我们 illustrate,比较模型中数据和没有瓶颈的数据的代表可以产生Compositionality度量。该程序应用于使用合成数据评估算术表达式,以及使用自然语言数据进行情感分类。我们证明,通过瓶颈压缩非Compositional示例是不成比例的,然后使用瓶颈Compositionality度量(BCM)将Compositional样本与非Compositional样本区分开来,产生一个数据集上的Compositionality排名。
https://arxiv.org/abs/2301.13714
As global digitization continues to grow, technology becomes more affordable and easier to use, and social media platforms thrive, becoming the new means of spreading information and news. Communities are built around sharing and discussing current events. Within these communities, users are enabled to share their opinions about each event. Using Sentiment Analysis to understand the polarity of each message belonging to an event, as well as the entire event, can help to better understand the general and individual feelings of significant trends and the dynamics on online social networks. In this context, we propose a new ensemble architecture, EDSA-Ensemble (Event Detection Sentiment Analysis Ensemble), that uses Event Detection and Sentiment Analysis to improve the detection of the polarity for current events from Social Media. For Event Detection, we use techniques based on Information Diffusion taking into account both the time span and the topics. To detect the polarity of each event, we preprocess the text and employ several Machine and Deep Learning models to create an ensemble model. The preprocessing step includes several word representation models, i.e., raw frequency, TFIDF, Word2Vec, and Transformers. The proposed EDSA-Ensemble architecture improves the event sentiment classification over the individual Machine and Deep Learning models.
随着全球数字技术的发展,技术变得更加 affordable 和易用,社交媒体平台也蓬勃发展,成为传播信息和新闻的新途径。社区围绕着分享和讨论当前事件而建立。在这些社区中,用户可以分享他们对每个事件的看法。使用Sentiment Analysis来理解每个事件属于某个事件的信息和情感极性,以及整个事件的信息和情感极性,可以帮助更好地理解重要趋势和在线社交网络的动态。在这种情况下,我们提出了一个新的集成架构,EDSA-Ensemble(事件检测情感分析集成),它使用事件检测和情感分析来提高从社交媒体中检测当前事件的信息和情感极性。对于事件检测,我们使用基于信息扩散的技术,考虑到时间跨度和话题。为了检测每个事件的情感极性,我们预处理文本,并使用多个机器和深度学习模型创建一个集成模型。预处理步骤包括几个词表示模型,例如原始频率、TFIDF、Word2Vec和Transformer。提议的EDSA-Ensemble架构比单个机器和深度学习模型提高了事件情感分类的性能。
https://arxiv.org/abs/2301.12805
Review summarization is a non-trivial task that aims to summarize the main idea of the product review in the E-commerce website. Different from the document summary which only needs to focus on the main facts described in the document, review summarization should not only summarize the main aspects mentioned in the review but also reflect the personal style of the review author. Although existing review summarization methods have incorporated the historical reviews of both customer and product, they usually simply concatenate and indiscriminately model this two heterogeneous information into a long sequence. Moreover, the rating information can also provide a high-level abstraction of customer preference, it has not been used by the majority of methods. In this paper, we propose the Heterogeneous Historical Review aware Review Summarization Model (HHRRS) which separately models the two types of historical reviews with the rating information by a graph reasoning module with a contrastive loss. We employ a multi-task framework that conducts the review sentiment classification and summarization jointly. Extensive experiments on four benchmark datasets demonstrate the superiority of HHRRS on both tasks.
摘要归纳是一项非易的任务,旨在在电子商务网站上概括产品评论的主要思想。与仅关注文档中主要事实的描述的摘要相比, review summarization 不仅要概括评论中提到的主要方面,还要反映评论作者的个人风格。尽管现有的摘要归纳方法已经包括对客户和产品的历史评论,但它们通常只是简单地拼接在一起,并且毫无目的地将这两种不同信息model成一段长序列。此外,评分信息还可以提供客户偏好的高级别抽象,但它没有被大多数方法所使用。在本文中,我们提出了一种跨类型的历史评论意识到摘要归纳模型(HHRRS),该模型通过使用Contrastive Loss的Graph reasoning模块分别建模两种不同类型的历史评论和评分信息。我们采用了一个多任务框架,一起进行评论情感分类和摘要归纳。对四个基准数据集的广泛实验证明了HHRRS在两个任务上的优越性。
https://arxiv.org/abs/2301.11682
Extraction of sentiment signals from news text, stock message boards, and business reports, for stock movement prediction, has been a rising field of interest in finance. Building upon past literature, the most recent works attempt to better capture sentiment from sentences with complex syntactic structures by introducing aspect-level sentiment classification (ASC). Despite the growing interest, however, fine-grained sentiment analysis has not been fully explored in non-English literature due to the shortage of annotated finance-specific data. Accordingly, it is necessary for non-English languages to leverage datasets and pre-trained language models (PLM) of different domains, languages, and tasks to best their performance. To facilitate finance-specific ASC research in the Korean language, we build KorFinASC, a Korean aspect-level sentiment classification dataset for finance consisting of 12,613 human-annotated samples, and explore methods of intermediate transfer learning. Our experiments indicate that past research has been ignorant towards the potentially wrong knowledge of financial entities encoded during the training phase, which has overestimated the predictive power of PLMs. In our work, we use the term "non-stationary knowledge'' to refer to information that was previously correct but is likely to change, and present "TGT-Masking'', a novel masking pattern to restrict PLMs from speculating knowledge of the kind. Finally, through a series of transfer learning with TGT-Masking applied we improve 22.63% of classification accuracy compared to standalone models on KorFinASC.
https://arxiv.org/abs/2301.03136
To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.
https://arxiv.org/abs/2301.00418
Learning models are highly dependent on data to work effectively, and they give a better performance upon training on big datasets. Massive research exists in the literature to address the dataset adequacy issue. One promising approach for solving dataset adequacy issues is the data augmentation (DA) approach. In DA, the amount of training data instances is increased by making different transformations on the available data instances to generate new correct and representative data instances. DA increases the dataset size and its variability, which enhances the model performance and its prediction accuracy. DA also solves the class imbalance problem in the classification learning techniques. Few studies have recently considered DA in the Arabic language. These studies rely on traditional augmentation approaches, such as paraphrasing by using rules or noising-based techniques. In this paper, we propose a new Arabic DA method that employs the recent powerful modeling technique, namely the AraGPT-2, for the augmentation process. The generated sentences are evaluated in terms of context, semantics, diversity, and novelty using the Euclidean, cosine, Jaccard, and BLEU distances. Finally, the AraBERT transformer is used on sentiment classification tasks to evaluate the classification performance of the augmented Arabic dataset. The experiments were conducted on four sentiment Arabic datasets, namely AraSarcasm, ASTD, ATT, and MOVIE. The selected datasets vary in size, label number, and unbalanced classes. The results show that the proposed methodology enhanced the Arabic sentiment text classification on all datasets with an increase in F1 score by 4% in AraSarcasm, 6% in ASTD, 9% in ATT, and 13% in MOVIE.
https://arxiv.org/abs/2212.13939
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the similarity between the high- and low-resource languages as well as the amount of low-resource pretraining data on the other side. A robustness analysis suggests that PARC has the potential to achieve even stronger performance with more powerful MPLMs.
https://arxiv.org/abs/2212.09651
The COVID-19 pandemic has caused drastic alternations in human life in all aspects. The government's laws in this regard affected the lifestyle of all people. Due to this fact studying the sentiment of individuals is essential to be aware of the future impacts of the coming pandemics. To contribute to this aim, we proposed an NLP (Natural Language Processing) model to analyze open-text answers in a survey in Persian and detect positive and negative feelings of the people in Iran. In this study, a distilBert transformer model was applied to take on this task. We deployed three approaches to perform the comparison, and our best model could gain accuracy: 0.824, Precision: 0.824, Recall: 0.798, and F1 score: 0.804.
https://arxiv.org/abs/2212.08407
This paper presents a cross-lingual sentiment analysis of news articles using zero-shot and few-shot learning. The study aims to classify the Croatian news articles with positive, negative, and neutral sentiments using the Slovene dataset. The system is based on a trilingual BERT-based model trained in three languages: English, Slovene, Croatian. The paper analyses different setups using datasets in two languages and proposes a simple multi-task model to perform sentiment classification. The evaluation is performed using the few-shot and zero-shot scenarios in single-task and multi-task experiments for Croatian and Slovene.
https://arxiv.org/abs/2212.07160