Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.
尽管反事实解释是解释机器学习黑盒分类器的一种流行方法,但在自然语言处理中并不普遍。大多数方法通过迭代地扰动目标文档,使其与黑盒分类器的预测不同来进行这些解释。我们文献中识别出两种主要的反事实解释方法,即(a)透明方法,通过添加、删除或替换单词来扰动目标,以及(b)不透明方法,将目标文档投影到隐含、不可解释的空间中,然后在那里进行扰动。本文对这两种方法在三个经典的自然语言处理任务上的性能进行了比较研究。我们的实证证据表明,不透明方法对于下游应用(如虚假新闻检测或情感分析)来说可能是一个过度的复杂方法,因为它们增加了一个没有显著性能提升的额外的复杂性层。这些观察结果引发了我们的讨论,并提出了一个值得探讨的问题:是否使用另一个黑盒来解释黑盒是有意义的?
https://arxiv.org/abs/2404.14943
Aspect Based Sentiment Analysis (ABSA) tasks involve the extraction of fine-grained sentiment tuples from sentences, aiming to discern the author's opinions. Conventional methodologies predominantly rely on supervised approaches; however, the efficacy of such methods diminishes in low-resource domains lacking labeled datasets since they often lack the ability to generalize across domains. To address this challenge, we propose a simple and novel unsupervised approach to extract opinion terms and the corresponding sentiment polarity for aspect terms in a sentence. Our experimental evaluations, conducted on four benchmark datasets, demonstrate compelling performance to extract the aspect oriented opinion words as well as assigning sentiment polarity. Additionally, unsupervised approaches for opinion word mining have not been explored and our work establishes a benchmark for the same.
面向 aspect 的情感分析 (ASSA) 任务涉及从句子中提取细粒度情感元组,旨在辨别作者的观点。传统的机器学习方法主要依赖监督方法;然而,在缺乏标注数据资源的低资源领域,这些方法的效力减弱,因为它们往往无法跨越领域。为了应对这个挑战,我们提出了一种简单而新颖的无监督方法,用于从句子中提取方面词的 opinions 和相应的情感极性。我们对四个基准数据集的实验评估表明,提取面向方面的情感词以及分配情感极性具有令人信服的表现。此外,面向情感词挖掘的无监督方法尚未被探索过,我们的工作为相同建立了基准。
https://arxiv.org/abs/2404.13751
In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical approach to enhancing LLMs' post-tuning performance by refining input, output, and reasoning designs. We conduct a series of in-domain (ID) and out-of-domain (OOD) experiments to assess the impact of various design options on LLMs' downstream performance, revealing several intriguing patterns that hold consistently across different LLMs. Based on these insights, we propose an integrated SDE strategy, combining the most effective options, and validate its consistent superiority over heuristic sample designs in complex downstream tasks like multi-aspect sentiment analysis, event extraction, and nested entity recognition. Additionally, analyses of LLMs' inherent prompt/output perplexity, zero-shot, and ICL abilities illustrate that good PE strategies may not always translate to good SDE strategies. Code available at this https URL.
在蓬勃发展的自然语言处理领域(如ChatGPT和LLLaMAs)中,提示工程(PE)以其通过提示修改提高零 shot 或上下文学习(ICL)的显著性而闻名。然而,对于下游微调范式(微调是特定任务适应LLM的关键)的样本设计领域,开发仍然具有很大的潜力。本文介绍了一种称为Sample Design Engineering(SDE)的方法,通过优化输入、输出和推理设计来提高LLMs的微调性能。我们进行了一系列内部(ID)和外部(OO)实验,评估各种设计选项对LLMs下游性能的影响,揭示了一些在各种LLM上保持一致的有趣模式。基于这些洞见,我们提出了一个集成的SDE策略,结合了最有效的选项,并在复杂的多方面情感分析、事件提取和嵌套实体识别等下游任务中验证了其的一致优越性。此外,LLMs固有的提示/输出谜团、零 shot和ICL能力分析表明,好的PE策略可能不总是转化为好的SDE策略。代码位于此链接处:https://www.aclweb.org/anthology/N22-3630
https://arxiv.org/abs/2404.13033
Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily on English, with very little work dedicated to other languages. In this paper, we compile and make publicly available the MAiDE-up dataset, consisting of 10,000 real and 10,000 AI-generated fake hotel reviews, balanced across ten languages. Using this dataset, we conduct extensive linguistic analyses to (1) compare the AI fake hotel reviews to real hotel reviews, and (2) identify the factors that influence the deception detection model performance. We explore the effectiveness of several models for deception detection in hotel reviews across three main dimensions: sentiment, location, and language. We find that these dimensions influence how well we can detect AI-generated fake reviews.
欺骗性评论变得越来越普遍,尤其是在性能和LLM的普及程度增加的情况下。尽管迄今为止的工作已经解决了在真实和欺骗性人类评论之间发展模型的研究,但关于真实评论和AI编写的虚假评论之间的区别还知之甚少。此外,迄今为止,大部分研究都主要关注英语,而几乎没有关于其他语言的研究。在本文中,我们汇总并公开MAiDE-up数据集,包括10,000条真实酒店评论和10,000条由AI生成的虚假酒店评论,覆盖了十种语言。利用这个数据集,我们进行了广泛的语义分析,以(1)将AI虚假酒店评论与真实酒店评论进行比较,和(2)确定影响欺骗检测模型性能的因素。我们在三个主要维度上探索了在酒店评论中进行欺骗检测的不同模型:情感、位置和语言。我们发现,这些维度确实影响着我们在检测AI生成的虚假评论方面的效果。
https://arxiv.org/abs/2404.12938
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at this https URL.
自动实时视频评论因其在叙述生成、主题解释等领域的 significance而受到越来越多的关注。然而,目前的 methods 中缺少情感考虑。情感因素在交互式评论中至关重要,目前还没有相关研究。因此,在本文中,我们提出了一个基于情感的 Transformer-based Variational Autoencoder (So-TVAE) 网络,由情感导向的多样性编码模块和批量注意模块组成,以实现具有多种情感和多种语义的视频评论。具体来说,我们的情感导向多样性编码器巧妙地将 VAE 和随机掩码机制结合起来,在情感引导下实现语义多样性,然后与跨模态特征融合以生成实时视频评论。此外,本文还提出了一种批量注意模块,以减轻由于数据不平衡引起的问题,即在视频受欢迎程度不同的情况下,情感样本的缺失问题。在 Livebot 和 VideoIC 数据集上进行的大量实验证明,与最先进的methods 相比,所提出的 So-TVAE 在评论质量和多样性方面都表现出色。相关代码可在此处访问:https://url.
https://arxiv.org/abs/2404.12782
In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA. Co-SA comprises two critical components: the Sentiment Agents Establishment (SAE) phase and the Sentiment Agents Cooperation (SAC) phase. During the SAE phase, each sentiment agent deals with an unimodal signal and highlights explicit dynamic sentiment variations within the modality via the Modality-Sentiment Disentanglement (MSD) and Deep Phase Space Reconstruction (DPSR) modules. Subsequently, in the SAC phase, Co-SA meticulously designs task-specific interaction mechanisms for sentiment agents so that coordinating multimodal signals to learn the joint representation. Specifically, Co-SA equips an independent policy model for each sentiment agent that captures significant properties within the modality. These policies are optimized mutually through the unified reward adaptive to downstream tasks. Benefitting from the rewarding mechanism, Co-SA transcends the limitation of pre-defined fusion modes and adaptively captures unimodal properties for MRL in the multimodal interaction setting. To demonstrate the effectiveness of Co-SA, we apply it to address Multimodal Sentiment Analysis (MSA) and Multimodal Emotion Recognition (MER) tasks. Our comprehensive experimental results demonstrate that Co-SA excels at discovering diverse cross-modal features, encompassing both common and complementary aspects. The code can be available at this https URL.
在本文中,我们提出了一个新的多模态表示学习(MRL)方法,名为合作情感代理(Co-SA),用于多模态情感分析(MSA),并通过合作情感代理促进模态之间的自适应交互。Co-SA包括两个关键组件:情感代理建立(SAE)阶段和情感代理合作(SAC)阶段。在SAE阶段,每个情感代理处理一个单模态信号,并通过模态情感解离(MSD)和深度时域重构(DPSR)模块在模态内突出显示动态情感变化。然后,在SAC阶段,Co-SA精心设计了一系列任务特定的情感代理交互机制,以协调多模态信号以学习联合表示。具体来说,Co-SA为每个情感代理配备了一个独立的政策模型,该模型捕捉模态内的显著属性。这些策略通过统一奖励适应下游任务进行优化。得益于奖励机制,Co-SA超越了预定义的融合模式,并适应了多模态交互设置中的情感代理学习(MRL)。为了证明Co-SA的有效性,我们将它应用于情感多模态分析和情感识别任务。我们全面的实验结果表明,Co-SA在发现跨模态特征方面表现出色,涵盖模态共性和互补性的各个方面。代码可以从该链接获取。
https://arxiv.org/abs/2404.12642
The advent of generative artificial intelligence (GenAI) technologies has revolutionized research, with significant implications for Digital Humanities (DH), a field inherently intertwined with technological progress. This article investigates how digital humanities scholars adopt, practice, as well as critically evaluate, GenAI technologies such as ChatGPT in the research process. Drawing on 76 responses collected from an international survey study, we explored digital humanities scholars' rationale for GenAI adoption in research, identified specific use cases and practices of using GenAI to support various DH research tasks, and analyzed scholars' collective perceptions of GenAI's benefits, risks, and impact on DH research. The survey results suggest that DH research communities hold divisive sentiments towards the value of GenAI in DH scholarship, whereas the actual usage diversifies among individuals and across research tasks. Our survey-based analysis has the potential to serve as a basis for further empirical research on the impact of GenAI on the evolution of DH scholarship.
生成人工智能(GenAI)技术的出现已经彻底颠覆了研究,对数字人文主义(DH)领域产生了重大影响,这个领域与技术进步密不可分。本文调查了数字人文主义者如何在其研究过程中采用、实践以及批判性评估ChatGPT等GenAI技术。通过收集来自国际调查研究的76个回答,我们探讨了数字人文主义者对GenAI在DH研究中的采用原因,确定了使用GenAI支持各种DH研究任务的特定用法和实践,并分析了学者们对GenAI的益处、风险以及对DH研究的影响。调查结果表明,DH研究社区对GenAI在DH研究中的价值持有分歧意见,而实际使用情况则存在个体差异和研究任务之间的差异。基于调查的分析有可能成为进一步研究GenAI对DH研究演变影响的实证研究的依据。
https://arxiv.org/abs/2404.12458
In this paper we investigate the use of decoder-based generative transformers for extracting sentiment towards the named entities in Russian news articles. We study sentiment analysis capabilities of instruction-tuned large language models (LLMs). We consider the dataset of RuSentNE-2023 in our study. The first group of experiments was aimed at the evaluation of zero-shot capabilities of LLMs with closed and open transparencies. The second covers the fine-tuning of Flan-T5 using the "chain-of-thought" (CoT) three-hop reasoning framework (THoR). We found that the results of the zero-shot approaches are similar to the results achieved by baseline fine-tuned encoder-based transformers (BERT-base). Reasoning capabilities of the fine-tuned Flan-T5 models with THoR achieve at least 5% increment with the base-size model compared to the results of the zero-shot experiment. The best results of sentiment analysis on RuSentNE-2023 were achieved by fine-tuned Flan-T5-xl, which surpassed the results of previous state-of-the-art transformer-based classifiers. Our CoT application framework is publicly available: this https URL
在本文中,我们研究了使用基于解码器的生成转换器提取针对俄罗斯新闻文章中命名实体的情感。我们研究了指令微调的大型语言模型的情感分析能力。在我们的研究中,我们考虑了RuSentNE-2023数据集。第一组实验旨在评估LLMs的零样本性能。第二组实验涉及使用“思考链”(CoT)三步推理框架(THoR)对Flan-T5进行微调。我们发现,零样本方法的结果与基线微调的编码器基转换器类似。使用THoR对微调的Flan-T5模型的推理能力至少与基线大小模型相比增加了5%。在RuSentNE-2023上的情感分析最佳结果是由微调的Flan-T5-xl取得的,这超过了以往基于转换器的分类器的最佳结果。我们的CoT应用框架是公开可用的:这是https://this URL。
https://arxiv.org/abs/2404.12342
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at this https URL.
微调预训练的大型语言模型(LLMs)与人类价值观和意图对齐至关重要。这一过程通常采用比较对等关系和与参考LLM的KL散度的方法,重点关注模型生成的完整答案的评估。然而,这些回答的生成是在标记级别进行的,遵循了序列、自回归的样式。在本文中,我们引入了Token-level Direct Preference Optimization(TDPO),一种通过优化模型在每个标记级别的策略来与人类偏好对齐的新颖方法。与之前的方法不同,TDPO通过每个标记点的正向KL散度约束来改善对齐和多样性。利用布拉德利-特里模型作为基于标记的奖励系统,TDPO增强了KL散度的规范,同时保留了简单性,无需显式奖励建模。在各种文本任务的各种实验结果中,TDPO在平衡对齐与生成多样性方面的表现优于DPO。值得注意的是,在受控情感生成和单轮对话数据集上,TDPO与DPO的微调效果略好于PPO,显著地提高了生成的响应的质量。我们的代码目前是开源的,在以下链接处。
https://arxiv.org/abs/2404.11999
Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i.e., audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy.
多模态情感分析(MSA)旨在识别多模态视频内容中发言者的情感倾向,引发对涉及多模态数据隐私风险(如语音和面部图像)的严重关切。最近分布式协同学习被认为是保护多模态任务隐私的有效范式。然而,它们往往忽视不同模态之间的隐私差异,努力在性能和隐私保护之间找到平衡。因此,提出了一个有趣的问题:在提高多模态利用率的同时保护必要的模态。本文是第一个在MSA任务中实现模态指定(即音频和视觉)隐私保护的尝试。我们提出了一种新颖的混合分布式跨模态cGAN框架(HyDiscGAN),通过共享匿名文本数据学习多模态对齐生成假音频和视觉特征。目标是通过假特征利用来近似真实音频和视觉内容,确保隐私保护的同时有效增强性能。大量实验证明,与最先进的MSA模型相比,HyDiscGAN可以在保持隐私的同时实现卓越或竞争力的性能。
https://arxiv.org/abs/2404.11938
In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries between understanding and generation tasks. Building upon prevailing practices in the field, we analyze the advantages and shortcomings associated with the prevalent ABSA evaluation paradigms. Through an in-depth examination, supplemented by illustrative examples, we highlight the intricacies involved in aligning generative outputs with other evaluative metrics, specifically those derived from other tasks, including question answering. While we steer clear of advocating for a singular and definitive metric, our contribution lies in paving the path for a comprehensive guideline tailored for ABSA evaluations in this generative paradigm. In this position paper, we aim to provide practitioners with profound reflections, offering insights and directions that can aid in navigating this evolving landscape, ensuring evaluations that are both accurate and reflective of generative capabilities.
在自然语言处理领域中,生成式语言模型的发展带来了对评估方法论的重新审视和重新定义的必要性,特别是在基于方面的情感分析(ABSA)领域。本文探讨了生成范式带来的新兴挑战,该范式在理解和生成任务的边界上稍微模糊了传统的界限。我们在领域内现有的实践基础上,分析了我们所使用的普遍评估范式中的优势和不足。通过深入的分析和示例,我们强调了将生成输出与来自其他任务的评估指标相协调的复杂性。虽然我们不主张为一个单一和确定的指标,但我们的贡献在于为这个生成范式中的ABSA评估提供全面的指导方针。在这种情况下,我们的目标是为实践者提供深刻的思考,提供有益的见解和方向,以帮助 navigate这个不断发展的领域,并确保评估既准确又反映生成能力。
https://arxiv.org/abs/2404.11539
Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the traditional prediction task to model the sentiment using the review as input. Using the peak-end rule in psychology, we classify a sample as C1 if its overall sentiment score approximates an average of all the sentence-level sentiments in the review, and C2 if the overall sentiment score approximates an average of the peak and end sentiments. For the prediction task, we use the discovered causal mechanisms behind the samples to improve the performance of LLMs by proposing causal prompts that give the models an inductive bias of the underlying causal graph, leading to substantial improvements by up to 32.13 F1 points on zero-shot five-class SA. Our code is at this https URL
情感分析(SA)旨在识别文本中表达的情感,如产品评论。给定一个评论及其情感,本文将SA分解为两个任务:(1)一个因果发现任务,区分是否是评论“推动了”情感(因果假设C1),或者是情感“推动了”评论(因果假设C2);和(2)传统预测任务,使用评论作为输入来建模情感。根据心理学的峰值-端规则,我们将样本分类为C1,如果其整体情感得分约等于评论中所有句子级情感的平均值,则C1;如果整体情感得分约等于评论中峰值和结束情感的平均值,则C2。对于预测任务,我们使用发现样本背后的因果机制来提高LLM的性能,通过提出因果提示,使模型对潜在因果图具有归纳偏见,从而在零散五类SA上实现显著的改进,和改进后的F1得分达到32.13。我们的代码位于此链接:
https://arxiv.org/abs/2404.11055
The field of natural language processing (NLP) has made significant progress with the rapid development of deep learning technologies. One of the research directions in text sentiment analysis is sentiment analysis of medical texts, which holds great potential for application in clinical diagnosis. However, the medical field currently lacks sufficient text datasets, and the effectiveness of sentiment analysis is greatly impacted by different model design approaches, which presents challenges. Therefore, this paper focuses on the medical domain, using bidirectional encoder representations from transformers (BERT) as the basic pre-trained model and experimenting with modules such as convolutional neural network (CNN), fully connected network (FCN), and graph convolutional networks (GCN) at the output layer. Experiments and analyses were conducted on the METS-CoV dataset to explore the training performance after integrating different deep learning networks. The results indicate that CNN models outperform other networks when trained on smaller medical text datasets in combination with pre-trained models like BERT. This study highlights the significance of model selection in achieving effective sentiment analysis in the medical domain and provides a reference for future research to develop more efficient model architectures.
自然语言处理(NLP)领域在深度学习技术的快速发展中取得了显著的进步。文本情感分析的一个研究方向是医疗文本的情感分析,这在临床诊断中有很大的应用潜力。然而,目前医疗领域缺乏足够的文本数据,情感分析的效果受到不同模型设计方法的影响,这带来了挑战。因此,本文重点关注医疗领域,使用来自Transformer(BERT)双向编码器表示作为基本预训练模型,并尝试在输出层使用卷积神经网络(CNN)、全连接网络(FCN)和图卷积网络(GCN)等模块。在METS-CoV数据集上进行了实验和分析,以探索在整合不同深度学习网络后进行训练的性能。实验结果表明,当用较小的医疗文本数据集与预训练模型如BERT相结合训练时,CNN模型在其他网络中表现优异。本研究突出了在医疗领域实现有效情感分析的重要性,并为未来研究提供了参考,以开发更有效的模型架构。
https://arxiv.org/abs/2404.10503
With the growth of textual data across online platforms, sentiment analysis has become crucial for extracting insights from user-generated content. While traditional approaches and deep learning models have shown promise, they cannot often capture complex relationships between entities. In this paper, we propose leveraging Relational Graph Convolutional Networks (RGCNs) for sentiment analysis, which offer interpretability and flexibility by capturing dependencies between data points represented as nodes in a graph. We demonstrate the effectiveness of our approach by using pre-trained language models such as BERT and RoBERTa with RGCN architecture on product reviews from Amazon and Digikala datasets and evaluating the results. Our experiments highlight the effectiveness of RGCNs in capturing relational information for sentiment analysis tasks.
随着互联网上文本数据的快速增长,从用户生成内容中提取洞见的情味分析变得越来越重要。虽然传统方法和深度学习模型已经显示出巨大的潜力,但它们通常无法捕捉实体之间复杂的关系。在本文中,我们提出了一种利用关系图卷积网络(RGCNs)进行情感分析的方法,通过捕获数据点表示为图中节点的依赖关系来提供解释性和灵活性。我们通过使用预训练语言模型(如BERT和RoBERTa)的RGCN架构来评估亚马逊和Digikala数据集中的产品评论,以证明我们方法的有效性。我们的实验强调了RGCN在情感分析任务中捕获关系信息的有效性。
https://arxiv.org/abs/2404.13079
Large Language Models (LLMs) are already as persuasive as humans. However, we know very little about why. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments. Using a dataset of 1,251 participants in an experiment, we analyze the persuaion strategies of LLM-generated and human-generated arguments using measures of cognitive effort (lexical and grammatical complexity) and moral-emotional language (sentiment and moral analysis). The study reveals that LLMs produce arguments that require higher cognitive effort, exhibiting more complex grammatical and lexical structures than human counterparts. Additionally, LLMs demonstrate a significant propensity to engage more deeply with moral language, utilizing both positive and negative moral foundations more frequently than humans. In contrast with previous research, no significant difference was found in the emotional content produced by LLMs and humans. These findings contribute to the discourse on AI and persuasion, highlighting the dual potential of LLMs to both enhance and undermine informational integrity through communication strategies for digital persuasion.
大语言模型(LLMs)已经具有与人类相同的说服力。然而,我们对其原因的了解仍然非常有限。本文研究了LLMs的说服策略,将它们与人类生成的论据进行比较。通过一个由1,251名参与者组成的实验的数据集,我们使用认知努力(词汇和语法复杂性)和道德情感语言(情感和道德分析)来分析LLM生成的论据和人类生成的论据。研究发现,LLMs生成的论据需要更高的认知努力,表现出比人类更复杂的词汇和语法结构。此外,LLMs表明更倾向于与道德语言深入互动,比人类更频繁地利用积极和消极道德基础。与之前的研究相比,LLMs和人类在情感内容上没有显著差异。这些发现有助于人们对AI说服力和信息完整性进行讨论,突出了LLMs通过数字说服策略在增强和破坏信息完整性方面的双重潜力。
https://arxiv.org/abs/2404.09329
The explosive growth of online content demands robust Natural Language Processing (NLP) techniques that can capture nuanced meanings and cultural context across diverse languages. Semantic Textual Relatedness (STR) goes beyond superficial word overlap, considering linguistic elements and non-linguistic factors like topic, sentiment, and perspective. Despite its pivotal role, prior NLP research has predominantly focused on English, limiting its applicability across languages. Addressing this gap, our paper dives into capturing deeper connections between sentences beyond simple word overlap. Going beyond English-centric NLP research, we explore STR in Marathi, Hindi, Spanish, and English, unlocking the potential for information retrieval, machine translation, and more. Leveraging the SemEval-2024 shared task, we explore various language models across three learning paradigms: supervised, unsupervised, and cross-lingual. Our comprehensive methodology gains promising results, demonstrating the effectiveness of our approach. This work aims to not only showcase our achievements but also inspire further research in multilingual STR, particularly for low-resourced languages.
互联网内容的爆炸式增长要求具备稳健的自然语言处理(NLP)技术,能够捕捉多样语言中微妙的含义和文化背景。语义文本相关性(STR)超越了表面的单词重叠,考虑了语言元素和非语言因素如主题、情感和观点。尽管它具有关键作用,但先前的NLP研究主要集中在英语,限制了其对其他语言的适用性。解决这个空白,我们的论文深入研究了句子之间的更深层次联系,超越了简单的单词重叠。在英语中心NLP研究的扩展之外,我们探讨了STR在马哈蒂尔语、印地语、西班牙语和英语中的应用,为信息检索、机器翻译等应用提供了潜力。利用SemEval-2024共享任务,我们研究了三种学习范式下的各种语言模型:监督、无监督和跨语言。我们全面的方法论取得了很好的结果,证明了我们的方法的有效性。这项工作旨在展示我们的成就,同时鼓励进一步研究多语言STR,特别是对于资源有限的语言。
https://arxiv.org/abs/2404.09047
Multimodal video sentiment analysis aims to integrate multiple modal information to analyze the opinions and attitudes of speakers. Most previous work focuses on exploring the semantic interactions of intra- and inter-modality. However, these works ignore the reliability of multimodality, i.e., modalities tend to contain noise, semantic ambiguity, missing modalities, etc. In addition, previous multimodal approaches treat different modalities equally, largely ignoring their different contributions. Furthermore, existing multimodal sentiment analysis methods directly regress sentiment scores without considering ordinal relationships within sentiment categories, with limited performance. To address the aforementioned problems, we propose a trustworthy multimodal sentiment ordinal network (TMSON) to improve performance in sentiment analysis. Specifically, we first devise a unimodal feature extractor for each modality to obtain modality-specific features. Then, an uncertainty distribution estimation network is customized, which estimates the unimodal uncertainty distributions. Next, Bayesian fusion is performed on the learned unimodal distributions to obtain multimodal distributions for sentiment prediction. Finally, an ordinal-aware sentiment space is constructed, where ordinal regression is used to constrain the multimodal distributions. Our proposed TMSON outperforms baselines on multimodal sentiment analysis tasks, and empirical results demonstrate that TMSON is capable of reducing uncertainty to obtain more robust predictions.
多模态视频情感分析旨在将多个模态信息集成到一起,以分析发言者的观点和态度。大多数先前的研究都集中在探索内模态和外模态之间的语义交互。然而,这些工作忽略了多模态的可靠性,即模态通常包含噪声、语义模糊、缺失模态等。此外,之前的多模态方法没有将不同的模态同等对待,很大程度上忽略了它们的不同贡献。此外,现有的多模态情感分析方法在处理情感类别内的顺序关系时直接回归情感分数,性能有限。为解决上述问题,我们提出了一个可信赖的多模态情感顺序网络(TMSON),以提高情感分析的性能。具体来说,我们首先为每个模态设计了一个单模态特征提取器,以获得模态特有的特征。然后,自定义了不确定性分布估计网络,估计单模态不确定性分布。接下来,在学习的单模态分布上进行贝叶斯融合,以获得多模态分布用于情感预测。最后,我们构建了一个ordinal-aware情感空间,使用顺序回归约束多模态分布。我们的TMSON在多模态情感分析任务中优于基线,而实证结果表明,TMSON具有减少不确定性的能力,从而获得更稳健的预测结果。
https://arxiv.org/abs/2404.08923
This paper describes submissions from the team Nostra Domina to the EvaLatin 2024 shared task of emotion polarity detection. Given the low-resource environment of Latin and the complexity of sentiment in rhetorical genres like poetry, we augmented the available data through automatic polarity annotation. We present two methods for doing so on the basis of the $k$-means algorithm, and we employ a variety of Latin large language models (LLMs) in a neural architecture to better capture the underlying contextual sentiment representations. Our best approach achieved the second highest macro-averaged Macro-$F_1$ score on the shared task's test set.
本文描述了来自Nostra Domina团队在EvaLatin 2024共同任务情感极性检测中的提交。考虑到拉丁语的低资源环境和诗歌等修辞体中情感的复杂性,我们通过自动极性注释来增加可用数据。我们在基于$k$-means算法的两种方法上进行研究,并使用各种拉丁大型语言模型(LLMs)来构建神经架构,更好地捕捉潜在上下文情感表示。我们最好的方法在共享任务的测试集中获得了第二个最高的宏观平均Macro-$F_1$得分。
https://arxiv.org/abs/2404.07792
Dehumanisation involves the perception and or treatment of a social group's members as less than human. This phenomenon is rarely addressed with computational linguistic techniques. We adapt a recently proposed approach for English, making it easier to transfer to other languages and to evaluate, introducing a new sentiment resource, the use of zero-shot cross-lingual valence and arousal detection, and a new method for statistical significance testing. We then apply it to study attitudes to migration expressed in Slovene newspapers, to examine changes in the Slovene discourse on migration between the 2015-16 migration crisis following the war in Syria and the 2022-23 period following the war in Ukraine. We find that while this discourse became more negative and more intense over time, it is less dehumanising when specifically addressing Ukrainian migrants compared to others.
翻译:非人化是指将社会群体成员视为低于人类的现象。这种现象很少通过计算语言技术来解决。我们适应了一个最近提出的英语方法,使其更容易转移到其他语言和评估,并引入了一个新的情感资源——零距离跨语言情感检测和统计显著性测试,以及一种新的方法用于统计显著性测试。然后将其应用于研究斯洛文尼亚报纸上表达的针对移民的态度,探讨了2015-16年叙利亚战争后斯洛文尼亚言论中关于移民的论述在2022-23年乌克兰冲突后期的变化。我们发现,尽管这一论述在随着时间的推移而变得更加负面和强烈,但当具体针对乌克兰移民时,它并不算非人类化。
https://arxiv.org/abs/2404.07036
The purpose of emotion-cause pair extraction is to extract the pair of emotion clauses and cause clauses. On the one hand, the existing methods do not take fully into account the relationship between the emotion extraction of two auxiliary tasks. On the other hand, the existing two-stage model has the problem of error propagation. In addition, existing models do not adequately address the emotion and cause-induced locational imbalance of samples. To solve these problems, an end-to-end multitasking model (MM-ECPE) based on shared interaction between GRU, knowledge graph and transformer modules is proposed. Furthermore, based on MM-ECPE, in order to use the encoder layer to better solve the problem of imbalanced distribution of clause distances between clauses and emotion clauses, we propose a novel encoding based on BERT, sentiment lexicon, and position-aware interaction module layer of emotion motif pair retrieval model (MM-ECPE(BERT)). The model first fully models the interaction between different tasks through the multi-level sharing module, and mines the shared information between emotion-cause pair extraction and the emotion extraction and cause extraction. Second, to solve the imbalanced distribution of emotion clauses and cause clauses problem, suitable labels are screened out according to the knowledge graph path length and task-specific features are constructed so that the model can focus on extracting pairs with corresponding emotion-cause relationships. Experimental results on the ECPE benchmark dataset show that the proposed model achieves good performance, especially on position-imbalanced samples.
情感词对提取的目的是提取情感短语和原因短语。一方面,现有的方法没有充分考虑两个自辅助任务之间的情感提取关系。另一方面,现有的两阶段模型存在错误传播问题。此外,现有的模型没有充分解决样本情感和原因诱导的局部不平衡问题。为解决这些问题,我们提出了一个基于GRU、知识图和Transformer模块的端到端多任务模型(MM-ECPE)。 此外,基于MM-ECPE,为了更好地利用编码器层解决词汇表征层之间短语距离的不平衡问题,我们提出了一个基于BERT、情感词汇和位置感知交互模块的情感短语对检索模型(MM-ECPE(BERT))的新编码器层。 模型首先通过多级共享模块全面建模不同任务之间的交互,并挖掘情感词对提取和情感提取及原因提取之间的共享信息。然后,为解决情感短语和原因短语的不平衡分布问题,根据知识图路径长度和任务特定特征筛选出适当的标签,以便模型集中精力提取相应情感词对之间的关系。在ECPE基准数据集的实验结果中,与现有模型相比,所提出的模型在位置不平衡样本上的表现良好。
https://arxiv.org/abs/2404.06812