Aspect-based sentiment analysis (ASBA) is a refined approach to sentiment analysis that aims to extract and classify sentiments based on specific aspects or features of a product, service, or entity. Unlike traditional sentiment analysis, which assigns a general sentiment score to entire reviews or texts, ABSA focuses on breaking down the text into individual components or aspects (e.g., quality, price, service) and evaluating the sentiment towards each. This allows for a more granular level of understanding of customer opinions, enabling businesses to pinpoint specific areas of strength and improvement. The process involves several key steps, including aspect extraction, sentiment classification, and aspect-level sentiment aggregation for a review paragraph or any other form that the users have provided. ABSA has significant applications in areas such as product reviews, social media monitoring, customer feedback analysis, and market research. By leveraging techniques from natural language processing (NLP) and machine learning, ABSA facilitates the extraction of valuable insights, enabling companies to make data-driven decisions that enhance customer satisfaction and optimize offerings. As ABSA evolves, it holds the potential to greatly improve personalized customer experiences by providing a deeper understanding of sentiment across various product aspects. In this work, we have analyzed the strength of LLMs for a complete cross-domain aspect-based sentiment analysis with the aim of defining the framework for certain products and using it for other similar situations. We argue that it is possible to that at an effectiveness of 92\% accuracy for the Aspect Based Sentiment Analysis dataset of SemEval-2015 Task 12.
基于方面的情感分析(ASBA)是一种细化的情感分析方法,旨在根据产品、服务或实体的特定方面或特征来提取和分类情感。与传统的整体评论或文本通用情感评分的方法不同,ASBA专注于将文本分解为各个组成部分或方面(例如质量、价格、服务),并对每个方面的感受进行评估。这种方法使企业能够更细致地了解客户的意见,并确定具体的优势和改进领域。该过程包括几个关键步骤:方面提取、情感分类以及对评论段落或其他用户提供的形式的方面层面的情感聚合。 ASBA在产品评价、社交媒体监控、顾客反馈分析及市场研究等领域具有广泛的应用价值。通过利用自然语言处理(NLP)和机器学习技术,ASBA能够抽取有价值的见解,使公司能做出基于数据的决策以提升客户满意度并优化提供服务。随着ASBA的发展,它有望通过更深入地理解不同产品方面的情感来大幅提高个性化客户体验。 在这项工作中,我们分析了大型语言模型(LLM)在跨领域方面情感分析中的强度,并旨在为某些产品定义框架,同时将此应用于类似情况。我们认为,有可能达到针对SemEval-2015 Task 12的基于方面的语义情感分析数据集的92%准确率。 该段落强调了ASBA方法的重要性及其在不同领域的应用潜力,并提出了一种通过大型语言模型实现跨领域ASBA的方法框架。此外,还提出了一个目标,在特定的数据集中达到高精度的结果(即92%)。
https://arxiv.org/abs/2501.08974
Sentiment analysis is one of the most crucial tasks in Natural Language Processing (NLP), involving the training of machine learning models to classify text based on the polarity of opinions. Pre-trained Language Models (PLMs) can be applied to downstream tasks through fine-tuning, eliminating the need to train the model from scratch. Specifically, PLMs have been employed for Sentiment Analysis, a process that involves detecting, analyzing, and extracting the polarity of text sentiments. Numerous models have been proposed to address this task, with pre-trained PhoBERT-V2 models standing out as the state-of-the-art language models for Vietnamese. The PhoBERT-V2 pre-training approach is based on RoBERTa, optimizing the BERT pre-training method for more robust performance. In this paper, we introduce a novel approach that combines PhoBERT-V2 and SentiWordnet for Sentiment Analysis of Vietnamese reviews. Our proposed model utilizes PhoBERT-V2 for Vietnamese, offering a robust optimization for the prominent BERT model in the context of Vietnamese language, and leverages SentiWordNet, a lexical resource explicitly designed to support sentiment classification applications. Experimental results on the VLSP 2016 and AIVIVN 2019 datasets demonstrate that our sentiment analysis system has achieved excellent performance in comparison to other models.
情感分析是自然语言处理(NLP)中最关键的任务之一,涉及通过训练机器学习模型来根据意见的极性对文本进行分类。预训练的语言模型(PLM)可以通过微调应用于下游任务,从而无需从头开始重新训练模型。具体来说,这些PLMs已被用于情感分析过程,该过程包括检测、分析和提取文本情绪的极性。已经提出了多种模型来解决这一任务,其中基于RoBERTa优化了BERT预训练方法的PhoBERT-V2预训练方法脱颖而出,成为越南语最先进的语言模型。在这篇论文中,我们介绍了一种结合使用PhoBERT-V2和SentiWordnet进行越南评论情感分析的新颖方法。我们的提议模型利用了针对越南语进行了强大优化的PhoBERT-V2,并借鉴了专门为支持情感分类应用而设计的词典资源SentiWordnet。在VLSP 2016和AIVIVN 2019数据集上的实验结果表明,我们的情感分析系统与其他模型相比取得了卓越的成绩。 这一段文本概述了一个基于PhoBERT-V2和SentiWordnet的新情感分析方法,并强调了该系统的性能优势。
https://arxiv.org/abs/2501.08758
This paper explores the development of a multimodal sentiment analysis model that integrates text, audio, and visual data to enhance sentiment classification. The goal is to improve emotion detection by capturing the complex interactions between these modalities, thereby enabling more accurate and nuanced sentiment interpretation. The study evaluates three feature fusion strategies -- late stage fusion, early stage fusion, and multi-headed attention -- within a transformer-based architecture. Experiments were conducted using the CMU-MOSEI dataset, which includes synchronized text, audio, and visual inputs labeled with sentiment scores. Results show that early stage fusion significantly outperforms late stage fusion, achieving an accuracy of 71.87\%, while the multi-headed attention approach offers marginal improvement, reaching 72.39\%. The findings suggest that integrating modalities early in the process enhances sentiment classification, while attention mechanisms may have limited impact within the current framework. Future work will focus on refining feature fusion techniques, incorporating temporal data, and exploring dynamic feature weighting to further improve model performance.
本文探讨了一种多模态情感分析模型的发展,该模型结合了文本、音频和视觉数据以增强情感分类。目标是通过捕捉这些模式之间的复杂交互来提高情绪检测的准确性,并实现更准确和细致的情感解读。研究评估了三种特征融合策略——晚期融合、早期融合和多头注意力机制——在基于变压器架构中的表现。实验使用CMU-MOSEI数据集进行,该数据集包括同步文本、音频和视觉输入,并附有情感评分标签。 实验结果显示,早期融合显著优于晚期融合,在准确性上达到了71.87%,而多头注意机制则提供了边际改进,准确率达到72.39%。研究发现表明,将模式在处理过程中尽早整合可以提升情感分类的效果,而在当前框架下注意力机制的影响有限。未来的研究将重点放在完善特征融合技术、纳入时间数据以及探索动态特征加权上,以进一步提高模型性能。
https://arxiv.org/abs/2501.08085
Understanding emotions in videos is a challenging task. However, videos contain several modalities which make them a rich source of data for machine learning and deep learning tasks. In this work, we aim to improve video sentiment classification by focusing on two key aspects: the video itself, the accompanying text, and the acoustic features. To address the limitations of relying on large labeled datasets, we are developing a method that utilizes clustering-based semi-supervised pre-training to extract meaningful representations from the data. This pre-training step identifies patterns in the video and text data, allowing the model to learn underlying structures and relationships without requiring extensive labeled information at the outset. Once these patterns are established, we fine-tune the system in a supervised manner to classify the sentiment expressed in videos. We believe that this multi-modal approach, combining clustering with supervised fine-tuning, will lead to more accurate and insightful sentiment classification, especially in cases where labeled data is limited.
理解视频中的情感是一项具有挑战性的任务。然而,由于视频包含了多种模态(如视觉和音频信息),这使得它们成为机器学习和深度学习任务中数据丰富的来源。在这项工作中,我们旨在通过关注两个关键方面来改进视频情绪分类:视频本身、伴随的文本以及声学特征。 为了克服依赖大规模标注数据集的局限性,我们正在开发一种基于聚类的半监督预训练方法,这种方法可以从数据中提取有意义的表示形式。在预训练阶段,该方法能够识别视频和文本数据中的模式,从而使模型能够在初始阶段不需要大量标注信息的情况下学习到潜在结构与关系。 一旦建立了这些模式,我们将通过有监督的方式对系统进行微调以分类视频表达的情绪。我们相信,这种结合了聚类技术与有监督微调的多模态方法将能够实现更准确且深入的情感分类,尤其是在标注数据有限的情况下更是如此。
https://arxiv.org/abs/2501.06475
Bidirectional transformers excel at sentiment analysis, and Large Language Models (LLM) are effective zero-shot learners. Might they perform better as a team? This paper explores collaborative approaches between ELECTRA and GPT-4o for three-way sentiment classification. We fine-tuned (FT) four models (ELECTRA Base/Large, GPT-4o/4o-mini) using a mix of reviews from Stanford Sentiment Treebank (SST) and DynaSent. We provided input from ELECTRA to GPT as: predicted label, probabilities, and retrieved examples. Sharing ELECTRA Base FT predictions with GPT-4o-mini significantly improved performance over either model alone (82.74 macro F1 vs. 79.29 ELECTRA Base FT, 79.52 GPT-4o-mini) and yielded the lowest cost/performance ratio (\$0.12/F1 point). However, when GPT models were fine-tuned, including predictions decreased performance. GPT-4o FT-M was the top performer (86.99), with GPT-4o-mini FT close behind (86.77) at much less cost (\$0.38 vs. \$1.59/F1 point). Our results show that augmenting prompts with predictions from fine-tuned encoders is an efficient way to boost performance, and a fine-tuned GPT-4o-mini is nearly as good as GPT-4o FT at 76% less cost. Both are affordable options for projects with limited resources.
双向Transformer在情感分析中表现出色,而大型语言模型(LLM)作为零样本学习者非常有效。它们能否作为一个团队表现得更好呢?本文探讨了ELECTRA和GPT-4o之间合作方法,用于三分类情感分析的实验研究。我们使用斯坦福情感树库(SST)和DynaSent中的评论混合数据集对四种模型(ELECTRA Base/Large、GPT-4o/4o-mini)进行了微调(FT)。 我们将从ELECTRA获得的信息传递给GPT,包括预测标签、概率以及检索到的示例。当使用ELECTRA Base FT的预测与GPT-4o-mini共享时,整体性能相较于单个模型有显著提高(宏F1分数为82.74相比单独使用ELECTRA Base FT得分为79.29和仅使用GPT-4o-mini得分为79.52),并且成本/性能比率最低(每分0.12美元)。然而,当对GPT模型进行微调时,包括预测反而会降低其性能。 GPT-4o FT-M在所有测试中表现最佳(宏F1分数为86.99),而GPT-4o-mini FT紧随其后(宏F1分数为86.77),但成本显著较低(每分0.38美元对1.59美元)。我们的结果表明,通过将经过微调的编码器预测加入提示中可以有效提升性能,并且在成本上节省了大约76%的情况下,GPT-4o-mini FT的表现几乎与GPT-4o FT相当。这两种方法都是资源有限项目中的经济实惠选择。
https://arxiv.org/abs/2501.00062
This research investigates the performance of various machine learning algorithms (CNN, LSTM, VADER, and RoBERTa) for sentiment analysis of Twitter data related to imported food items in Trinidad and Tobago. The study addresses three primary research questions: the comparative accuracy and efficiency of the algorithms, the optimal configurations for each model, and the potential applications of the optimized models in a live system for monitoring public sentiment and its impact on the import bill. The dataset comprises tweets from 2018 to 2024, divided into imbalanced, balanced, and temporal subsets to assess the impact of data balancing and the COVID-19 pandemic on sentiment trends. Ten experiments were conducted to evaluate the models under various configurations. Results indicated that VADER outperformed the other models in both multi-class and binary sentiment classifications. The study highlights significant changes in sentiment trends pre- and post-COVID-19, with implications for import policies.
这项研究调查了多种机器学习算法(包括卷积神经网络CNN、长短时记忆模型LSTM、VADER和RoBERTa)在特立尼达和多巴哥进口食品相关推文的情感分析中的表现。该研究主要探讨三个问题:不同算法的准确性和效率比较,每种模型的最佳配置,以及优化后的模型在实时系统中监测公众情绪及其对进口账单的影响方面的潜在应用。 数据集包含2018年至2024年期间的推文,并分为不平衡、平衡和时间序列子集,以评估数据平衡和COVID-19大流行对情感趋势的影响。进行了十项实验来评估不同配置下的模型表现。结果表明,在多分类和二元情感分类中,VADER算法的表现优于其他所有模型。 该研究还指出,疫情前后的情感趋势发生了显著变化,并且这些变化对于进口政策具有重要意义。
https://arxiv.org/abs/2412.19781
Fine-grained sentiment analysis (FSA) aims to extract and summarize user opinions from vast opinionated text. Recent studies demonstrate that large language models (LLMs) possess exceptional sentiment understanding capabilities. However, directly deploying LLMs for FSA applications incurs high inference costs. Therefore, this paper investigates the distillation of fine-grained sentiment understanding from LLMs into small language models (SLMs). We prompt LLMs to examine and interpret the sentiments of given reviews and then utilize the generated content to pretrain SLMs. Additionally, we develop a comprehensive FSA benchmark to evaluate both SLMs and LLMs. Extensive experiments on this benchmark reveal that: (1) distillation significantly enhances the performance of SLMs in FSA tasks, achieving a 6.00\% improvement in $F_1$-score, and the distilled model can outperform Llama-2-7b with only 220M parameters; (2) distillation equips SLMs with excellent zero-shot sentiment classification capabilities, enabling them to match or even exceed their teacher models. These results suggest that distillation from LLMs is a highly promising direction for FSA. We will release our code, data, and pretrained model weights at \url{this https URL}.
细粒度情感分析(FSA)旨在从大量的观点文本中提取和汇总用户意见。近期研究表明,大型语言模型(LLMs)具备卓越的情感理解能力。然而,直接将LLMs应用于FSA应用会带来高昂的推理成本。因此,本文研究了将细粒度情感理解从LLMs蒸馏到小型语言模型(SLMs)的方法。我们引导LLMs检查并解释给定评论中的情感,并利用生成的内容对SLMs进行预训练。此外,我们开发了一个全面的FSA基准来评估SLMs和LLMs的表现。在该基准上的广泛实验表明:(1) 蒸馏显著提高了SLMs在FSA任务中的性能,$F_1$得分提升了6.00%,并且经过蒸馏后的模型可以以仅2.2亿参数优于拥有70亿参数的Llama-2-7b;(2) 蒸馏赋予了SLMs优秀的零样本情感分类能力,使其能够与或甚至超越其教师模型相匹配。这些结果表明,从LLMs进行蒸馏是FSA的一个极具前景的方向。我们将发布我们的代码、数据和预训练模型权重至\url{此 https URL}。
https://arxiv.org/abs/2412.18552
This research explores the applicability of cross-lingual transfer learning from English to Japanese and Indonesian using the XLM-R pre-trained model. The results are compared with several previous works, either by models using a similar zero-shot approach or a fully-supervised approach, to provide an overview of the zero-shot transfer learning approach's capability using XLM-R in comparison with existing models. Our models achieve the best result in one Japanese dataset and comparable results in other datasets in Japanese and Indonesian languages without being trained using the target language. Furthermore, the results suggest that it is possible to train a multi-lingual model, instead of one model for each language, and achieve promising results.
这项研究探讨了使用XLM-R预训练模型从英语向日语和印度尼西亚语进行跨语言迁移学习的应用性。研究结果与之前的几项工作进行了比较,这些工作的对比对象包括采用类似零样本方法或完全监督方法的模型,以此来概述使用XLM-R进行零样本迁移学习的方法相对于现有模型的能力。我们的模型在一组日语数据集中取得了最佳效果,并且在其他日语和印度尼西亚语的数据集中也获得了可比的结果,而这些结果是在没有使用目标语言训练的情况下实现的。此外,研究结果显示有可能训练一个多语言模型,而不是为每种语言单独训练一个模型,并取得令人满意的效果。
https://arxiv.org/abs/2412.18188
Sentiment analysis is a crucial task in natural language processing (NLP) with applications in public opinion monitoring, market research, and beyond. This paper introduces a three-class sentiment classification method for Weibo comments using Long Short-Term Memory (LSTM) networks to discern positive, neutral, and negative sentiments. LSTM, as a deep learning model, excels at capturing long-distance dependencies in text data, providing significant advantages over traditional machine learning approaches. Through preprocessing and feature extraction from Weibo comment texts, our LSTM model achieves precise sentiment prediction. Experimental results demonstrate superior performance, achieving an accuracy of 98.31% and an F1 score of 98.28%, notably outperforming conventional models and other deep learning methods. This underscores the effectiveness of LSTM in capturing nuanced sentiment information within text, thereby enhancing classification accuracy. Despite its strengths, the LSTM model faces challenges such as high computational complexity and slower processing times for lengthy texts. Moreover, complex emotional expressions like sarcasm and humor pose additional difficulties. Future work could explore combining pre-trained models or advancing feature engineering techniques to further improve both accuracy and practicality. Overall, this study provides an effective solution for sentiment analysis on Weibo comments.
情感分析是自然语言处理(NLP)中的一项关键任务,其应用范围包括舆论监控、市场调研等领域。本文介绍了一种使用长短期记忆网络(LSTM)对微博评论进行三分类情感分析的方法,以区分积极、中立和消极的情感。作为一种深度学习模型,LSTM在捕捉文本数据中的远程依赖关系方面表现出色,相比传统机器学习方法具有显著优势。通过预处理和提取微博评论文本的特征,我们的LSTM模型实现了精准的情感预测。实验结果表明,该模型性能优越,准确率达到98.31%,F1得分为98.28%,明显优于常规模型和其他深度学习方法。这突显了LSTM在捕捉文本中的细微情感信息方面的有效性,从而提高了分类准确性。尽管如此,LSTM模型也面临一些挑战,例如计算复杂度高和处理长篇幅文本时速度较慢等问题。此外,复杂的感情表达如讽刺和幽默带来了额外的困难。未来的研究可以探索结合预训练模型或进一步改进特征工程技术以提高准确性和实用性。总体而言,本研究为微博评论的情感分析提供了一个有效的解决方案。
https://arxiv.org/abs/2412.17347
The increasing deployment of Large Language Models (LLMs) in various applications necessitates a rigorous evaluation of their robustness against adversarial attacks. In this paper, we present a comprehensive study on the robustness of GPT LLM family. We employ two distinct evaluation methods to assess their resilience. The first method introduce character-level text attack in input prompts, testing the models on three sentiment classification datasets: StanfordNLP/IMDB, Yelp Reviews, and SST-2. The second method involves using jailbreak prompts to challenge the safety mechanisms of the LLMs. Our experiments reveal significant variations in the robustness of these models, demonstrating their varying degrees of vulnerability to both character-level and semantic-level adversarial attacks. These findings underscore the necessity for improved adversarial training and enhanced safety mechanisms to bolster the robustness of LLMs.
大型语言模型(LLMs)在各种应用中的部署越来越多,这要求我们对其抵御对抗性攻击的鲁棒性进行严格评估。本文对GPT LLM系列的鲁棒性进行了全面研究。我们采用了两种不同的评估方法来测试它们的抗攻击能力。第一种方法是在输入提示中引入字符级别的文本攻击,在三个情感分类数据集上测试模型:StanfordNLP/IMDB、Yelp评论和SST-2。第二种方法是使用越狱提示来挑战LLMs的安全机制。我们的实验揭示了这些模型在鲁棒性上的显著差异,表明它们对字符级别和语义级别的对抗攻击具有不同程度的脆弱性。这些发现强调了改进对抗训练和增强安全机制以提高LLMs鲁棒性的必要性。
https://arxiv.org/abs/2412.17011
Sentiment analysis is an essential component of natural language processing, used to analyze sentiments, attitudes, and emotional tones in various contexts. It provides valuable insights into public opinion, customer feedback, and user experiences. Researchers have developed various classical machine learning and neuro-fuzzy approaches to address the exponential growth of data and the complexity of language structures in sentiment analysis. However, these approaches often fail to determine the optimal number of clusters, interpret results accurately, handle noise or outliers efficiently, and scale effectively to high-dimensional data. Additionally, they are frequently insensitive to input variations. In this paper, we propose a novel hybrid approach for sentiment analysis called the Quantum Fuzzy Neural Network (QFNN), which leverages quantum properties and incorporates a fuzzy layer to overcome the limitations of classical sentiment analysis algorithms. In this study, we test the proposed approach on two Twitter datasets: the Coronavirus Tweets Dataset (CVTD) and the General Sentimental Tweets Dataset (GSTD), and compare it with classical and hybrid algorithms. The results demonstrate that QFNN outperforms all classical, quantum, and hybrid algorithms, achieving 100% and 90% accuracy in the case of CVTD and GSTD, respectively. Furthermore, QFNN demonstrates its robustness against six different noise models, providing the potential to tackle the computational complexity associated with sentiment analysis on a large scale in a noisy environment. The proposed approach expedites sentiment data processing and precisely analyses different forms of textual data, thereby enhancing sentiment classification and insights associated with sentiment analysis.
情感分析是自然语言处理的一个重要组成部分,用于在各种语境中分析情绪、态度和情感语气。它为公众意见、客户反馈以及用户体验提供了有价值的洞察。研究人员开发了多种经典机器学习方法和神经模糊技术来应对情感分析中的数据指数级增长及语言结构复杂性问题。然而,这些方法往往无法确定最优聚类数量,准确解释结果,有效处理噪音或异常值,也无法有效地扩展到高维数据,并且常常对输入变化不敏感。本文提出了一种名为量子模糊神经网络(QFNN)的新颖混合情感分析方法,该方法利用量子特性并结合模糊层来克服经典情感分析算法的局限性。本研究在两个推特数据集上测试了所提出的方案:新冠病毒推文数据集(CVTD)和通用情感推文数据集(GSTD),并与经典及混合算法进行了比较。结果显示,QFNN 在所有经典、量子和混合算法中表现最优,在 CVTD 和 GSTD 中分别实现了 100% 和 90% 的准确率。此外,QFNN 对六种不同的噪音模型表现出鲁棒性,提供了处理大规模情感分析计算复杂性的潜力,特别是在有噪音的环境中。所提出的方案加快了情感数据处理速度,并能精确地分析不同形式的文本数据,从而提升了情感分类和与之相关的洞察力。
https://arxiv.org/abs/2412.12731
This paper proposes a look ahead text understanding problem with look ahead section identification (LASI) as an example. This problem may appear in generative AI as well as human interactions, where we want to understand the direction of a developing text or conversation. We tackle the problem using transformer-based LLMs. We show that LASI is more challenging than classic section identification (SI). We argue that both bidirectional contextual information (e.g., BERT) and unidirectional predictive ability (e.g., GPT) will benefit the task. We propose two approaches to stitch together BERT and GPT. Experiments show that our approach outperforms the established models, especially when there is noise in the text (which is often the case for developing text in generative AI). Our paper sheds light on other look ahead text understanding tasks that are important to social media, such as look ahead sentiment classification, and points out the opportunities to leverage pre-trained LLMs through stitching.
https://arxiv.org/abs/2412.17836
With the rapid development of multimedia, the shift from unimodal textual sentiment analysis to multimodal image-text sentiment analysis has obtained academic and industrial attention in recent years. However, multimodal sentiment analysis is affected by unimodal data bias, e.g., text sentiment is misleading due to explicit sentiment semantic, leading to low accuracy in the final sentiment classification. In this paper, we propose a novel CounterFactual Multimodal Sentiment Analysis framework (CF-MSA) using causal counterfactual inference to construct multimodal sentiment causal inference. CF-MSA mitigates the direct effect from unimodal bias and ensures heterogeneity across modalities by differentiating the treatment variables between modalities. In addition, considering the information complementarity and bias differences between modalities, we propose a new optimisation objective to effectively integrate different modalities and reduce the inherent bias from each modality. Experimental results on two public datasets, MVSA-Single and MVSA-Multiple, demonstrate that the proposed CF-MSA has superior debiasing capability and achieves new state-of-the-art performances. We will release the code and datasets to facilitate future research.
随着多媒体的快速发展,从单一模态文本情感分析转向多模态图文情感分析在近年来获得了学术界和工业界的广泛关注。然而,多模态情感分析受到单模态数据偏见的影响,例如由于显性的情感语义,文本情感可能误导,导致最终的情感分类准确性较低。本文提出了一种新颖的反事实多模态情感分析框架(CF-MSA),该框架使用因果反事实推理来构建多模态情感因果推理。CF-MSA 减轻了单模态偏见的直接影响,并通过区分各模态之间的处理变量确保跨模态异质性。此外,考虑到不同模态间的信息互补性和偏差差异,我们提出了一种新的优化目标,以有效整合不同的模态并减少每个模态固有的偏差。在两个公共数据集 MVSA-Single 和 MVSA-Multiple 上的实验结果表明,所提出的 CF-MSA 具有卓越的去偏能力,并达到了最新的性能水平。我们将发布代码和数据集,以促进未来的相关研究。
https://arxiv.org/abs/2412.07292
Multidomain sentiment analysis involves estimating the polarity of an unstructured text by exploiting domain specific information. One of the main issues common to the approaches discussed in the literature is their poor applicability to domains that differ from those used to construct opinion this http URL paper aims to present a new method for Persian multidomain SA analysis using deep learning approaches. The proposed BERTCapsules approach consists of a combination of BERT and Capsule models. In this approach, BERT was used for Instance representation, and Capsule Structure was used to learn the extracted graphs. Digikala dataset, including ten domains with both positive and negative polarity, was used to evaluate this approach. The evaluation of the BERTCaps model achieved an accuracy of 0.9712 in sentiment classification binary classification and 0.8509 in domain classification .
多领域情感分析涉及利用特定领域的信息来估计非结构化文本的情感倾向。文献中讨论的方法普遍存在一个主要问题,即对与构建意见的领域不同的领域适用性较差。本文旨在介绍一种使用深度学习方法进行波斯语多领域情感分析的新方法。所提出的BERTCapsules方法结合了BERT和胶囊模型。在这个方法中,BERT用于实例表示,而胶囊结构用于学习提取的图。我们使用包含十个领域的Digikala数据集(每个领域都有正向和负向的情感倾向)来评估这种方法。BERTCaps模型在情感分类二元分类中的准确率为0.9712,在领域分类中的准确率为0.8509。
https://arxiv.org/abs/2412.05591
Integrated Gradients is a well-known technique for explaining deep learning models. It calculates feature importance scores by employing a gradient based approach computing gradients of the model output with respect to input features and accumulating them along a linear path. While this works well for continuous features spaces, it may not be the most optimal way to deal with discrete spaces like word embeddings. For interpreting LLMs (Large Language Models), there exists a need for a non-linear path where intermediate points, whose gradients are to be computed, lie close to actual words in the embedding space. In this paper, we propose a method called Uniform Discretized Integrated Gradients (UDIG) based on a new interpolation strategy where we choose a favorable nonlinear path for computing attribution scores suitable for predictive language models. We evaluate our method on two types of NLP tasks- Sentiment Classification and Question Answering against three metrics viz Log odds, Comprehensiveness and Sufficiency. For sentiment classification, we have used the SST2, IMDb and Rotten Tomatoes datasets for benchmarking and for Question Answering, we have used the fine-tuned BERT model on SQuAD dataset. Our approach outperforms the existing methods in almost all the metrics.
https://arxiv.org/abs/2412.03886
The COVID-19 pandemic has significantly transformed global lifestyles, enforcing physical isolation and accelerating digital adoption for work, education, and social interaction. This study examines the pandemic's impact on mental health by analyzing dream content shared on the Reddit r/Dreams community. With over 374,000 subscribers, this platform offers a rich dataset for exploring subconscious responses to the pandemic. Using statistical methods, we assess shifts in dream positivity, negativity, and neutrality from the pre-pandemic to post-pandemic era. To enhance our analysis, we fine-tuned the LLaMA 3.1-8B model with labeled data, enabling precise sentiment classification of dream content. Our findings aim to uncover patterns in dream content, providing insights into the psychological effects of the pandemic and its influence on subconscious processes. This research highlights the profound changes in mental landscapes and the role of dreams as indicators of public well-being during unprecedented times.
新冠病毒大流行显著改变了全球的生活方式,强制实行物理隔离,并加速了工作、教育和社会互动方面的数字化进程。本研究通过分析Reddit r/Dreams社区中分享的梦境内容,探讨疫情对心理健康的影响。该平台拥有超过374,000名订阅者,提供了丰富的数据集来探索人们对大流行的潜意识反应。我们使用统计方法评估从疫情前到疫情期间梦的内容在积极、消极和中立方面的变化。为了增强我们的分析,我们将LLaMA 3.1-8B模型与标注数据进行了微调,从而能够对梦境内容进行精确的情感分类。本研究发现旨在揭示梦境内容中的模式,为大流行的心理影响以及它对潜意识过程的影响提供洞见。这项研究突显了心理景观的深刻变化,并强调了在前所未有的时期,梦想作为衡量公众福祉指标的作用。
https://arxiv.org/abs/2501.07839
With strong expressive capabilities in Large Language Models(LLMs), generative models effectively capture sentiment structures and deep semantics, however, challenges remain in fine-grained sentiment classification across multi-lingual and complex contexts. To address this, we propose the Sentiment Cross-Lingual Recognition and Logic Framework (SentiXRL), which incorporates two modules,an emotion retrieval enhancement module to improve sentiment classification accuracy in complex contexts through historical dialogue and logical reasoning,and a self-circulating analysis negotiation mechanism (SANM)to facilitates autonomous decision-making within a single model for classification this http URL have validated SentiXRL's superiority on multiple standard datasets, outperforming existing models on CPED and CH-SIMS,and achieving overall better performance on MELD,Emorynlp and IEMOCAP. Notably, we unified labels across several fine-grained sentiment annotation datasets and conducted category confusion experiments, revealing challenges and impacts of class imbalance in standard datasets.
强大的语言模型(LLMs)在表达能力方面表现出色,生成型模型能够有效捕捉情感结构和深层语义。然而,在多语言和复杂背景下进行细粒度情感分类仍存在挑战。为此,我们提出了情感跨语言识别与逻辑框架(SentiXRL),该框架包含两个模块:情感检索增强模块通过历史对话和逻辑推理来提高复杂背景下的情感分类准确性;自我循环分析谈判机制(SANM)促进单个模型内部的自主决策以进行分类。我们在多个标准数据集上验证了SentiXRL的优势,它在CPED和CH-SIMS上的表现超过了现有模型,并且在MELD、EmoryNLP和IEMOCAP上也表现出总体更好的性能。值得注意的是,我们统一了几种细粒度情感标注数据集的标签并进行了类别混淆实验,揭示了标准数据集中类别不平衡带来的挑战及其影响。
https://arxiv.org/abs/2411.18162
Evaluating the importance of different layers in large language models (LLMs) is crucial for optimizing model performance and interpretability. This paper first explores layer importance using the Activation Variance-Sparsity Score (AVSS), which combines normalized activation variance and sparsity to quantify each layer's contribution to overall model performance. By ranking layers based on AVSS and pruning the least impactful 25\%, our experiments on tasks such as question answering, language modeling, and sentiment classification show that over 90\% of the original performance is retained, highlighting potential redundancies in LLM architectures. Building on AVSS, we propose an enhanced version tailored to assess hallucination propensity across layers (EAVSS). This improved approach introduces Hallucination-Specific Activation Variance (HSAV) and Hallucination-Specific Sparsity (HSS) metrics, allowing precise identification of hallucination-prone layers. By incorporating contrastive learning on these layers, we effectively mitigate hallucination generation, contributing to more robust and efficient LLMs(The maximum performance improvement is 12\%). Our results on the NQ, SciQ, TriviaQA, TruthfulQA, and WikiQA datasets demonstrate the efficacy of this method, offering a comprehensive framework for both layer importance evaluation and hallucination mitigation in LLMs.
评估大型语言模型(LLMs)中不同层的重要性对于优化模型性能和解释性至关重要。本文首先通过激活方差稀疏得分(AVSS)探索了层的重要性,该方法结合了归一化的激活方差和稀疏度来量化每一层对整体模型性能的贡献。通过对基于AVSS排序后的层进行修剪,去除影响最小的25%,我们在问答、语言建模和情感分类等任务上的实验显示,保留了超过90%的原始性能,这突显了LLM架构中的潜在冗余性。在AVSS的基础上,我们提出了一种增强版——专门用于跨层级评估幻觉倾向(EAVSS)。这一改进的方法引入了特定于幻觉的激活方差(HSAV)和稀疏度(HSS)指标,能够精确识别出容易产生幻觉的层。通过在这些层上应用对比学习,我们有效地减少了幻觉生成,有助于构建更加健壮和高效的LLMs(性能提升最大可达12%)。我们在NQ、SciQ、TriviaQA、TruthfulQA和WikiQA数据集上的结果展示了该方法的有效性,提供了一个全面的框架来评估LLM层的重要性和减少幻觉。
https://arxiv.org/abs/2411.10069
Moral sentiments expressed in natural language significantly influence both online and offline environments, shaping behavioral styles and interaction patterns, including social media selfpresentation, cyberbullying, adherence to social norms, and ethical decision-making. To effectively measure moral sentiments in natural language processing texts, it is crucial to utilize large, annotated datasets that provide nuanced understanding for accurate analysis and modeltraining. However, existing corpora, while valuable, often face linguistic limitations. To address this gap in the Chinese language domain,we introduce the Moral Foundation Weibo Corpus. This corpus consists of 25,671 Chinese comments on Weibo, encompassing six diverse topic areas. Each comment is manually annotated by at least three systematically trained annotators based on ten moral categories derived from a grounded theory of morality. To assess annotator reliability, we present the kappa testresults, a gold standard for measuring consistency. Additionally, we apply several the latest large language models to supplement the manual annotations, conducting analytical experiments to compare their performance and report baseline results for moral sentiment classification.
道德情感通过自然语言表达,在线上和线下环境中都产生显著影响,塑造行为风格和互动模式,包括社交媒体自我呈现、网络欺凌、遵守社会规范以及伦理决策。为了有效测量自然语言处理文本中的道德情感,至关重要的是要利用大型的注释数据集,以便提供细致的理解进行准确分析和模型训练。然而,现有的语料库虽然有价值,但往往面临语言上的限制。为了解决中文领域这一空白,我们推出了道德基础微博语料库。该语料库包含25,671条来自微博的中文评论,涵盖了六个不同的主题领域。每条评论至少由三位系统训练过的标注员基于十个源自道德扎根理论的道德类别进行手动标注。为了评估标注者的可靠性,我们提供了卡帕检验结果,这是衡量一致性的一项黄金标准。此外,我们还应用了几种最新的大型语言模型来补充手动注释,并进行了分析实验以比较它们的表现,报告了道德情感分类的基础线结果。
https://arxiv.org/abs/2411.09612
South Africa and the Democratic Republic of Congo (DRC) present a complex linguistic landscape with languages such as Zulu, Sepedi, Afrikaans, French, English, and Tshiluba (Ciluba), which creates unique challenges for AI-driven translation and sentiment analysis systems due to a lack of accurately labeled data. This study seeks to address these challenges by developing a multilingual lexicon designed for French and Tshiluba, now expanded to include translations in English, Afrikaans, Sepedi, and Zulu. The lexicon enhances cultural relevance in sentiment classification by integrating language-specific sentiment scores. A comprehensive testing corpus is created to support translation and sentiment analysis tasks, with machine learning models such as Random Forest, Support Vector Machine (SVM), Decision Trees, and Gaussian Naive Bayes (GNB) trained to predict sentiment across low resource languages (LRLs). Among them, the Random Forest model performed particularly well, capturing sentiment polarity and handling language-specific nuances effectively. Furthermore, Bidirectional Encoder Representations from Transformers (BERT), a Large Language Model (LLM), is applied to predict context-based sentiment with high accuracy, achieving 99% accuracy and 98% precision, outperforming other models. The BERT predictions were clarified using Explainable AI (XAI), improving transparency and fostering confidence in sentiment classification. Overall, findings demonstrate that the proposed lexicon and machine learning models significantly enhance translation and sentiment analysis for LRLs in South Africa and the DRC, laying a foundation for future AI models that support underrepresented languages, with applications across education, governance, and business in multilingual contexts.
南非和刚果民主共和国(DRC)呈现出一种复杂的语言景观,包括祖鲁语、塞普迪语、阿非利卡语、法语、英语以及奇卢巴语(齐卢巴),这为基于AI的翻译和情感分析系统带来了独特的挑战,因为缺乏准确标注的数据。本研究旨在通过开发一个多语言词典来解决这些问题,该词典最初设计用于法语和奇卢巴语,并扩展到包括英语、阿非利卡语、塞普迪语以及祖鲁语的翻译。此词典通过整合特定于每种语言的情感评分来增强文化相关性。创建了一个全面的测试语料库以支持翻译和情感分析任务,随机森林(Random Forest)、支持向量机(SVM)、决策树(Decision Trees)和高斯朴素贝叶斯(GNB)等机器学习模型被训练用于预测低资源语言(LRLs)的情感。其中,随机森林模型表现尤为出色,在捕捉情感极性和处理特定于某种语言的细微差别方面非常有效。此外,应用了大型语言模型(LLM)中的双向编码器表示从转换器(BERT),以高精度预测基于上下文的情感,达到了99%的准确率和98%的精确度,超过了其他所有模型。使用可解释的人工智能(XAI)来澄清BERT的预测结果,提高了透明度并增强了情感分类的信心。总体而言,研究发现表明所提出的词典和机器学习模型显著提升了南非和刚果民主共和国低资源语言的情感分析与翻译能力,为支持未被充分代表的语言的未来AI模型奠定了基础,并在教育、治理及商业等多语言环境中具有广泛的应用前景。
https://arxiv.org/abs/2411.04316