Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train our NMT model. We show that integrating emotion information, especially arousal, into NMT systems leads to better translations.
神经机器翻译(NMT)是将一种语言文本翻译成另一种语言文本的任务,使用训练好的神经网络来实现。为了提高或控制预测的翻译质量(例如:情感、礼貌、性别等),许多现有作品试图将外部信息引入NMT模型中。在这项工作中,我们提出了一种通过添加另一个外部信息源来提高翻译质量的方法:说话者的情感。这项工作源于这样的假设,每个情感都与特定的词汇表相关联,这些词汇表可以在情感之间重叠。我们提出的方法分为两个阶段。首先,我们选择了一个最先进的语音情感识别(SER)模型,预测数据库中所有输入音频的维度情感值。然后,我们将这些预测的情感作为输入文本的开头添加,训练我们的NMT模型。我们证明了将情感信息,特别是兴奋,融入NMT系统会导致更好的翻译。
https://arxiv.org/abs/2404.17968
Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference problems. In particular, we use learned twist functions to estimate the expected future value of the potential at each timestep, which enables us to focus inference-time computation on promising partial sequences. We propose a novel contrastive method for learning the twist functions, and establish connections with the rich literature of soft reinforcement learning. As a complementary application of our twisted SMC framework, we present methods for evaluating the accuracy of language model inference techniques using novel bidirectional SMC bounds on the log partition function. These bounds can be used to estimate the KL divergence between the inference and target distributions in both directions. We apply our inference evaluation techniques to show that twisted SMC is effective for sampling undesirable outputs from a pretrained model (a useful component of harmlessness training and automated red-teaming), generating reviews with varied sentiment, and performing infilling tasks.
大语言模型(LLMs)具有许多能力和安全技术,包括强化学习(RLHF)、自动红色代理、提示工程和填充,可以将其视为对给定奖励或潜在函数定义的规范化目标分布的采样。在这项工作中,我们利用Sequential Monte Carlo(SMC)的丰富工具箱解决这些概率推理问题。特别是,我们使用学习来的扭曲函数来估计每个时间步的潜在价值的期望,这使得我们在推理时间内专注于有前景的局部序列。我们提出了一个新颖的对比学习方法来学习扭曲函数,并建立了与软强化学习丰富文献的联系。作为我们扭曲SMC框架的补充应用,我们提出了使用新颖的双向SMC边界来评估语言模型推理技术准确性的方法。这些边界可用于在两个方向上估计推理和目标分布之间的KL散度。我们将推理评估技术应用于表明,扭曲SMC对于从预训练模型( harmlessness培训和自动红色代理的有用组件)中采样不良输出(有用的训练和自动红色代理的一个有用组件)和生成带有不同情感的评论以及执行填充任务非常有效。
https://arxiv.org/abs/2404.17546
As NLP models become more complex, understanding their decisions becomes more crucial. Counterfactuals (CFs), where minimal changes to inputs flip a model's prediction, offer a way to explain these models. While Large Language Models (LLMs) have shown remarkable performance in NLP tasks, their efficacy in generating high-quality CFs remains uncertain. This work fills this gap by investigating how well LLMs generate CFs for two NLU tasks. We conduct a comprehensive comparison of several common LLMs, and evaluate their CFs, assessing both intrinsic metrics, and the impact of these CFs on data augmentation. Moreover, we analyze differences between human and LLM-generated CFs, providing insights for future research directions. Our results show that LLMs generate fluent CFs, but struggle to keep the induced changes minimal. Generating CFs for Sentiment Analysis (SA) is less challenging than NLI where LLMs show weaknesses in generating CFs that flip the original label. This also reflects on the data augmentation performance, where we observe a large gap between augmenting with human and LLMs CFs. Furthermore, we evaluate LLMs' ability to assess CFs in a mislabelled data setting, and show that they have a strong bias towards agreeing with the provided labels. GPT4 is more robust against this bias and its scores correlate well with automatic metrics. Our findings reveal several limitations and point to potential future work directions.
随着自然语言处理(NLP)模型的日益复杂,理解它们的决策变得越来越重要。反事实(CF)是一种方法,通过对输入进行最小修改来翻转模型的预测,提供了解释这些模型的方法。虽然大型自然语言处理模型(LLMs)在NLP任务中表现出色,但它们在生成高质量反事实方面的有效性仍然不确定。本文通过研究LLMs如何为两个NLU任务生成反事实来填补这一空白。我们全面比较了几个常见的LLM,并评估了它们的反事实,考虑了内生指标以及这些反事实对数据增强的影响。此外,我们分析了人类和LLM生成的反事实之间的差异,为未来的研究方向提供了洞察力。我们的研究结果表明,LLMs可以生成流畅的反事实,但在保持诱导变化最小方面遇到困难。为情感分析(SA)生成反事实比为LLM生成反事实更容易,这也反映了数据增强表现上人类和LLM生成的反事实之间存在较大差距。此外,我们评估了LLM在未标注数据环境下的反事实评估能力,并发现它们对提供标签的偏倚非常强烈。GPT4对这种偏见具有更强的抵抗力,其分数与自动指标密切相关。我们的研究结果揭示了几个局限性,并指出了潜在的未来研究方向。
https://arxiv.org/abs/2405.00722
This paper explores the importance of text sentiment analysis and classification in the field of natural language processing, and proposes a new approach to sentiment analysis and classification based on the bidirectional gated recurrent units (GRUs) model. The study firstly analyses the word cloud model of the text with six sentiment labels, and then carries out data preprocessing, including the steps of removing special symbols, punctuation marks, numbers, stop words and non-alphabetic parts. Subsequently, the data set is divided into training set and test set, and through model training and testing, it is found that the accuracy of the validation set is increased from 85% to 93% with training, which is an increase of 8%; at the same time, the loss value of the validation set decreases from 0.7 to 0.1 and tends to be stable, and the model is gradually close to the actual value, which can effectively classify the text emotions. The confusion matrix shows that the accuracy of the model on the test set reaches 94.8%, the precision is 95.9%, the recall is 99.1%, and the F1 score is 97.4%, which proves that the model has good generalisation ability and classification effect. Overall, the study demonstrated an effective method for text sentiment analysis and classification with satisfactory results.
本文探讨了自然语言处理领域中文本情感分析和分类的重要性,并基于双向循环单元(GRUs)模型提出了一种新的情感分析和分类方法。研究首先分析了具有六个情感标签的文本词云模型,然后进行了数据预处理,包括去除特殊符号、标点符号、数字、停用词和非字母部分。接着,将数据集划分为训练集和测试集,并通过模型训练和测试来发现,在训练过程中,验证集的准确度从85%提高到93%,提高了8%;同时,验证集的损失值从0.7降低到0.1,并趋向于稳定,模型逐渐逼近实际值,可以有效地分类文本情感。混淆矩阵显示,模型在测试集上的准确度为94.8%,精确度为95.9%,召回率为99.1%,F1分数为97.4%,这表明该模型具有良好的泛化能力和分类效果。总体而言,研究以令人满意的结果展示了文本情感分析和分类的有效方法。
https://arxiv.org/abs/2404.17123
Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.
多模态情感分析(MSA)旨在通过多模态数据理解人类情感。大多数MSA努力都是基于模态完备性的假设。然而,在现实应用中,一些实际因素导致不确定模态缺失,这严重削弱了模型的性能。为此,我们提出了一个在不确定缺失模态下的MSA任务的联合关系蒸馏(CorrKD)框架。具体来说,我们提出了一个样本级别的对比性蒸馏机制,用于将包含跨样本相关性的全面知识转移到重建缺失语义。此外,还引入了一个分类引导的原型蒸馏机制,通过分类原型来捕捉跨类相关性,从而使特征分布对齐,并生成有利的联合表示。最后,我们设计了一个响应解耦一致性蒸馏策略,通过响应解耦和互信息最大化来优化学生网络的情感决策边界。在三个数据集上的全面实验表明,与几个基线相比,我们的框架可以实现显著的改进。
https://arxiv.org/abs/2404.16456
Although counterfactual explanations are a popular approach to explain ML black-box classifiers, they are less widespread in NLP. Most methods find those explanations by iteratively perturbing the target document until it is classified differently by the black box. We identify two main families of counterfactual explanation methods in the literature, namely, (a) \emph{transparent} methods that perturb the target by adding, removing, or replacing words, and (b) \emph{opaque} approaches that project the target document into a latent, non-interpretable space where the perturbation is carried out subsequently. This article offers a comparative study of the performance of these two families of methods on three classical NLP tasks. Our empirical evidence shows that opaque approaches can be an overkill for downstream applications such as fake news detection or sentiment analysis since they add an additional level of complexity with no significant performance gain. These observations motivate our discussion, which raises the question of whether it makes sense to explain a black box using another black box.
尽管反事实解释是解释机器学习黑盒分类器的一种流行方法,但在自然语言处理中并不普遍。大多数方法通过迭代地扰动目标文档,使其与黑盒分类器的预测不同来进行这些解释。我们文献中识别出两种主要的反事实解释方法,即(a)透明方法,通过添加、删除或替换单词来扰动目标,以及(b)不透明方法,将目标文档投影到隐含、不可解释的空间中,然后在那里进行扰动。本文对这两种方法在三个经典的自然语言处理任务上的性能进行了比较研究。我们的实证证据表明,不透明方法对于下游应用(如虚假新闻检测或情感分析)来说可能是一个过度的复杂方法,因为它们增加了一个没有显著性能提升的额外的复杂性层。这些观察结果引发了我们的讨论,并提出了一个值得探讨的问题:是否使用另一个黑盒来解释黑盒是有意义的?
https://arxiv.org/abs/2404.14943
Aspect Based Sentiment Analysis (ABSA) tasks involve the extraction of fine-grained sentiment tuples from sentences, aiming to discern the author's opinions. Conventional methodologies predominantly rely on supervised approaches; however, the efficacy of such methods diminishes in low-resource domains lacking labeled datasets since they often lack the ability to generalize across domains. To address this challenge, we propose a simple and novel unsupervised approach to extract opinion terms and the corresponding sentiment polarity for aspect terms in a sentence. Our experimental evaluations, conducted on four benchmark datasets, demonstrate compelling performance to extract the aspect oriented opinion words as well as assigning sentiment polarity. Additionally, unsupervised approaches for opinion word mining have not been explored and our work establishes a benchmark for the same.
面向 aspect 的情感分析 (ASSA) 任务涉及从句子中提取细粒度情感元组,旨在辨别作者的观点。传统的机器学习方法主要依赖监督方法;然而,在缺乏标注数据资源的低资源领域,这些方法的效力减弱,因为它们往往无法跨越领域。为了应对这个挑战,我们提出了一种简单而新颖的无监督方法,用于从句子中提取方面词的 opinions 和相应的情感极性。我们对四个基准数据集的实验评估表明,提取面向方面的情感词以及分配情感极性具有令人信服的表现。此外,面向情感词挖掘的无监督方法尚未被探索过,我们的工作为相同建立了基准。
https://arxiv.org/abs/2404.13751
In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical approach to enhancing LLMs' post-tuning performance by refining input, output, and reasoning designs. We conduct a series of in-domain (ID) and out-of-domain (OOD) experiments to assess the impact of various design options on LLMs' downstream performance, revealing several intriguing patterns that hold consistently across different LLMs. Based on these insights, we propose an integrated SDE strategy, combining the most effective options, and validate its consistent superiority over heuristic sample designs in complex downstream tasks like multi-aspect sentiment analysis, event extraction, and nested entity recognition. Additionally, analyses of LLMs' inherent prompt/output perplexity, zero-shot, and ICL abilities illustrate that good PE strategies may not always translate to good SDE strategies. Code available at this https URL.
在蓬勃发展的自然语言处理领域(如ChatGPT和LLLaMAs)中,提示工程(PE)以其通过提示修改提高零 shot 或上下文学习(ICL)的显著性而闻名。然而,对于下游微调范式(微调是特定任务适应LLM的关键)的样本设计领域,开发仍然具有很大的潜力。本文介绍了一种称为Sample Design Engineering(SDE)的方法,通过优化输入、输出和推理设计来提高LLMs的微调性能。我们进行了一系列内部(ID)和外部(OO)实验,评估各种设计选项对LLMs下游性能的影响,揭示了一些在各种LLM上保持一致的有趣模式。基于这些洞见,我们提出了一个集成的SDE策略,结合了最有效的选项,并在复杂的多方面情感分析、事件提取和嵌套实体识别等下游任务中验证了其的一致优越性。此外,LLMs固有的提示/输出谜团、零 shot和ICL能力分析表明,好的PE策略可能不总是转化为好的SDE策略。代码位于此链接处:https://www.aclweb.org/anthology/N22-3630
https://arxiv.org/abs/2404.13033
Deceptive reviews are becoming increasingly common, especially given the increase in performance and the prevalence of LLMs. While work to date has addressed the development of models to differentiate between truthful and deceptive human reviews, much less is known about the distinction between real reviews and AI-authored fake reviews. Moreover, most of the research so far has focused primarily on English, with very little work dedicated to other languages. In this paper, we compile and make publicly available the MAiDE-up dataset, consisting of 10,000 real and 10,000 AI-generated fake hotel reviews, balanced across ten languages. Using this dataset, we conduct extensive linguistic analyses to (1) compare the AI fake hotel reviews to real hotel reviews, and (2) identify the factors that influence the deception detection model performance. We explore the effectiveness of several models for deception detection in hotel reviews across three main dimensions: sentiment, location, and language. We find that these dimensions influence how well we can detect AI-generated fake reviews.
欺骗性评论变得越来越普遍,尤其是在性能和LLM的普及程度增加的情况下。尽管迄今为止的工作已经解决了在真实和欺骗性人类评论之间发展模型的研究,但关于真实评论和AI编写的虚假评论之间的区别还知之甚少。此外,迄今为止,大部分研究都主要关注英语,而几乎没有关于其他语言的研究。在本文中,我们汇总并公开MAiDE-up数据集,包括10,000条真实酒店评论和10,000条由AI生成的虚假酒店评论,覆盖了十种语言。利用这个数据集,我们进行了广泛的语义分析,以(1)将AI虚假酒店评论与真实酒店评论进行比较,和(2)确定影响欺骗检测模型性能的因素。我们在三个主要维度上探索了在酒店评论中进行欺骗检测的不同模型:情感、位置和语言。我们发现,这些维度确实影响着我们在检测AI生成的虚假评论方面的效果。
https://arxiv.org/abs/2404.12938
Automatic live video commenting is with increasing attention due to its significance in narration generation, topic explanation, etc. However, the diverse sentiment consideration of the generated comments is missing from the current methods. Sentimental factors are critical in interactive commenting, and lack of research so far. Thus, in this paper, we propose a Sentiment-oriented Transformer-based Variational Autoencoder (So-TVAE) network which consists of a sentiment-oriented diversity encoder module and a batch attention module, to achieve diverse video commenting with multiple sentiments and multiple semantics. Specifically, our sentiment-oriented diversity encoder elegantly combines VAE and random mask mechanism to achieve semantic diversity under sentiment guidance, which is then fused with cross-modal features to generate live video comments. Furthermore, a batch attention module is also proposed in this paper to alleviate the problem of missing sentimental samples, caused by the data imbalance, which is common in live videos as the popularity of videos varies. Extensive experiments on Livebot and VideoIC datasets demonstrate that the proposed So-TVAE outperforms the state-of-the-art methods in terms of the quality and diversity of generated comments. Related code is available at this https URL.
自动实时视频评论因其在叙述生成、主题解释等领域的 significance而受到越来越多的关注。然而,目前的 methods 中缺少情感考虑。情感因素在交互式评论中至关重要,目前还没有相关研究。因此,在本文中,我们提出了一个基于情感的 Transformer-based Variational Autoencoder (So-TVAE) 网络,由情感导向的多样性编码模块和批量注意模块组成,以实现具有多种情感和多种语义的视频评论。具体来说,我们的情感导向多样性编码器巧妙地将 VAE 和随机掩码机制结合起来,在情感引导下实现语义多样性,然后与跨模态特征融合以生成实时视频评论。此外,本文还提出了一种批量注意模块,以减轻由于数据不平衡引起的问题,即在视频受欢迎程度不同的情况下,情感样本的缺失问题。在 Livebot 和 VideoIC 数据集上进行的大量实验证明,与最先进的methods 相比,所提出的 So-TVAE 在评论质量和多样性方面都表现出色。相关代码可在此处访问:https://url.
https://arxiv.org/abs/2404.12782
In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA. Co-SA comprises two critical components: the Sentiment Agents Establishment (SAE) phase and the Sentiment Agents Cooperation (SAC) phase. During the SAE phase, each sentiment agent deals with an unimodal signal and highlights explicit dynamic sentiment variations within the modality via the Modality-Sentiment Disentanglement (MSD) and Deep Phase Space Reconstruction (DPSR) modules. Subsequently, in the SAC phase, Co-SA meticulously designs task-specific interaction mechanisms for sentiment agents so that coordinating multimodal signals to learn the joint representation. Specifically, Co-SA equips an independent policy model for each sentiment agent that captures significant properties within the modality. These policies are optimized mutually through the unified reward adaptive to downstream tasks. Benefitting from the rewarding mechanism, Co-SA transcends the limitation of pre-defined fusion modes and adaptively captures unimodal properties for MRL in the multimodal interaction setting. To demonstrate the effectiveness of Co-SA, we apply it to address Multimodal Sentiment Analysis (MSA) and Multimodal Emotion Recognition (MER) tasks. Our comprehensive experimental results demonstrate that Co-SA excels at discovering diverse cross-modal features, encompassing both common and complementary aspects. The code can be available at this https URL.
在本文中,我们提出了一个新的多模态表示学习(MRL)方法,名为合作情感代理(Co-SA),用于多模态情感分析(MSA),并通过合作情感代理促进模态之间的自适应交互。Co-SA包括两个关键组件:情感代理建立(SAE)阶段和情感代理合作(SAC)阶段。在SAE阶段,每个情感代理处理一个单模态信号,并通过模态情感解离(MSD)和深度时域重构(DPSR)模块在模态内突出显示动态情感变化。然后,在SAC阶段,Co-SA精心设计了一系列任务特定的情感代理交互机制,以协调多模态信号以学习联合表示。具体来说,Co-SA为每个情感代理配备了一个独立的政策模型,该模型捕捉模态内的显著属性。这些策略通过统一奖励适应下游任务进行优化。得益于奖励机制,Co-SA超越了预定义的融合模式,并适应了多模态交互设置中的情感代理学习(MRL)。为了证明Co-SA的有效性,我们将它应用于情感多模态分析和情感识别任务。我们全面的实验结果表明,Co-SA在发现跨模态特征方面表现出色,涵盖模态共性和互补性的各个方面。代码可以从该链接获取。
https://arxiv.org/abs/2404.12642
The advent of generative artificial intelligence (GenAI) technologies has revolutionized research, with significant implications for Digital Humanities (DH), a field inherently intertwined with technological progress. This article investigates how digital humanities scholars adopt, practice, as well as critically evaluate, GenAI technologies such as ChatGPT in the research process. Drawing on 76 responses collected from an international survey study, we explored digital humanities scholars' rationale for GenAI adoption in research, identified specific use cases and practices of using GenAI to support various DH research tasks, and analyzed scholars' collective perceptions of GenAI's benefits, risks, and impact on DH research. The survey results suggest that DH research communities hold divisive sentiments towards the value of GenAI in DH scholarship, whereas the actual usage diversifies among individuals and across research tasks. Our survey-based analysis has the potential to serve as a basis for further empirical research on the impact of GenAI on the evolution of DH scholarship.
生成人工智能(GenAI)技术的出现已经彻底颠覆了研究,对数字人文主义(DH)领域产生了重大影响,这个领域与技术进步密不可分。本文调查了数字人文主义者如何在其研究过程中采用、实践以及批判性评估ChatGPT等GenAI技术。通过收集来自国际调查研究的76个回答,我们探讨了数字人文主义者对GenAI在DH研究中的采用原因,确定了使用GenAI支持各种DH研究任务的特定用法和实践,并分析了学者们对GenAI的益处、风险以及对DH研究的影响。调查结果表明,DH研究社区对GenAI在DH研究中的价值持有分歧意见,而实际使用情况则存在个体差异和研究任务之间的差异。基于调查的分析有可能成为进一步研究GenAI对DH研究演变影响的实证研究的依据。
https://arxiv.org/abs/2404.12458
In this paper we investigate the use of decoder-based generative transformers for extracting sentiment towards the named entities in Russian news articles. We study sentiment analysis capabilities of instruction-tuned large language models (LLMs). We consider the dataset of RuSentNE-2023 in our study. The first group of experiments was aimed at the evaluation of zero-shot capabilities of LLMs with closed and open transparencies. The second covers the fine-tuning of Flan-T5 using the "chain-of-thought" (CoT) three-hop reasoning framework (THoR). We found that the results of the zero-shot approaches are similar to the results achieved by baseline fine-tuned encoder-based transformers (BERT-base). Reasoning capabilities of the fine-tuned Flan-T5 models with THoR achieve at least 5% increment with the base-size model compared to the results of the zero-shot experiment. The best results of sentiment analysis on RuSentNE-2023 were achieved by fine-tuned Flan-T5-xl, which surpassed the results of previous state-of-the-art transformer-based classifiers. Our CoT application framework is publicly available: this https URL
在本文中,我们研究了使用基于解码器的生成转换器提取针对俄罗斯新闻文章中命名实体的情感。我们研究了指令微调的大型语言模型的情感分析能力。在我们的研究中,我们考虑了RuSentNE-2023数据集。第一组实验旨在评估LLMs的零样本性能。第二组实验涉及使用“思考链”(CoT)三步推理框架(THoR)对Flan-T5进行微调。我们发现,零样本方法的结果与基线微调的编码器基转换器类似。使用THoR对微调的Flan-T5模型的推理能力至少与基线大小模型相比增加了5%。在RuSentNE-2023上的情感分析最佳结果是由微调的Flan-T5-xl取得的,这超过了以往基于转换器的分类器的最佳结果。我们的CoT应用框架是公开可用的:这是https://this URL。
https://arxiv.org/abs/2404.12342
Fine-tuning pre-trained Large Language Models (LLMs) is essential to align them with human values and intentions. This process often utilizes methods like pairwise comparisons and KL divergence against a reference LLM, focusing on the evaluation of full answers generated by the models. However, the generation of these responses occurs in a token level, following a sequential, auto-regressive fashion. In this paper, we introduce Token-level Direct Preference Optimization (TDPO), a novel approach to align LLMs with human preferences by optimizing policy at the token level. Unlike previous methods, which face challenges in divergence efficiency, TDPO incorporates forward KL divergence constraints for each token, improving alignment and diversity. Utilizing the Bradley-Terry model for a token-based reward system, TDPO enhances the regulation of KL divergence, while preserving simplicity without the need for explicit reward modeling. Experimental results across various text tasks demonstrate TDPO's superior performance in balancing alignment with generation diversity. Notably, fine-tuning with TDPO strikes a better balance than DPO in the controlled sentiment generation and single-turn dialogue datasets, and significantly improves the quality of generated responses compared to both DPO and PPO-based RLHF methods. Our code is open-sourced at this https URL.
微调预训练的大型语言模型(LLMs)与人类价值观和意图对齐至关重要。这一过程通常采用比较对等关系和与参考LLM的KL散度的方法,重点关注模型生成的完整答案的评估。然而,这些回答的生成是在标记级别进行的,遵循了序列、自回归的样式。在本文中,我们引入了Token-level Direct Preference Optimization(TDPO),一种通过优化模型在每个标记级别的策略来与人类偏好对齐的新颖方法。与之前的方法不同,TDPO通过每个标记点的正向KL散度约束来改善对齐和多样性。利用布拉德利-特里模型作为基于标记的奖励系统,TDPO增强了KL散度的规范,同时保留了简单性,无需显式奖励建模。在各种文本任务的各种实验结果中,TDPO在平衡对齐与生成多样性方面的表现优于DPO。值得注意的是,在受控情感生成和单轮对话数据集上,TDPO与DPO的微调效果略好于PPO,显著地提高了生成的响应的质量。我们的代码目前是开源的,在以下链接处。
https://arxiv.org/abs/2404.11999
Multimodal Sentiment Analysis (MSA) aims to identify speakers' sentiment tendencies in multimodal video content, raising serious concerns about privacy risks associated with multimodal data, such as voiceprints and facial images. Recent distributed collaborative learning has been verified as an effective paradigm for privacy preservation in multimodal tasks. However, they often overlook the privacy distinctions among different modalities, struggling to strike a balance between performance and privacy preservation. Consequently, it poses an intriguing question of maximizing multimodal utilization to improve performance while simultaneously protecting necessary modalities. This paper forms the first attempt at modality-specified (i.e., audio and visual) privacy preservation in MSA tasks. We propose a novel Hybrid Distributed cross-modality cGAN framework (HyDiscGAN), which learns multimodality alignment to generate fake audio and visual features conditioned on shareable de-identified textual data. The objective is to leverage the fake features to approximate real audio and visual content to guarantee privacy preservation while effectively enhancing performance. Extensive experiments show that compared with the state-of-the-art MSA model, HyDiscGAN can achieve superior or competitive performance while preserving privacy.
多模态情感分析(MSA)旨在识别多模态视频内容中发言者的情感倾向,引发对涉及多模态数据隐私风险(如语音和面部图像)的严重关切。最近分布式协同学习被认为是保护多模态任务隐私的有效范式。然而,它们往往忽视不同模态之间的隐私差异,努力在性能和隐私保护之间找到平衡。因此,提出了一个有趣的问题:在提高多模态利用率的同时保护必要的模态。本文是第一个在MSA任务中实现模态指定(即音频和视觉)隐私保护的尝试。我们提出了一种新颖的混合分布式跨模态cGAN框架(HyDiscGAN),通过共享匿名文本数据学习多模态对齐生成假音频和视觉特征。目标是通过假特征利用来近似真实音频和视觉内容,确保隐私保护的同时有效增强性能。大量实验证明,与最先进的MSA模型相比,HyDiscGAN可以在保持隐私的同时实现卓越或竞争力的性能。
https://arxiv.org/abs/2404.11938
In the era of rapid evolution of generative language models within the realm of natural language processing, there is an imperative call to revisit and reformulate evaluation methodologies, especially in the domain of aspect-based sentiment analysis (ABSA). This paper addresses the emerging challenges introduced by the generative paradigm, which has moderately blurred traditional boundaries between understanding and generation tasks. Building upon prevailing practices in the field, we analyze the advantages and shortcomings associated with the prevalent ABSA evaluation paradigms. Through an in-depth examination, supplemented by illustrative examples, we highlight the intricacies involved in aligning generative outputs with other evaluative metrics, specifically those derived from other tasks, including question answering. While we steer clear of advocating for a singular and definitive metric, our contribution lies in paving the path for a comprehensive guideline tailored for ABSA evaluations in this generative paradigm. In this position paper, we aim to provide practitioners with profound reflections, offering insights and directions that can aid in navigating this evolving landscape, ensuring evaluations that are both accurate and reflective of generative capabilities.
在自然语言处理领域中,生成式语言模型的发展带来了对评估方法论的重新审视和重新定义的必要性,特别是在基于方面的情感分析(ABSA)领域。本文探讨了生成范式带来的新兴挑战,该范式在理解和生成任务的边界上稍微模糊了传统的界限。我们在领域内现有的实践基础上,分析了我们所使用的普遍评估范式中的优势和不足。通过深入的分析和示例,我们强调了将生成输出与来自其他任务的评估指标相协调的复杂性。虽然我们不主张为一个单一和确定的指标,但我们的贡献在于为这个生成范式中的ABSA评估提供全面的指导方针。在这种情况下,我们的目标是为实践者提供深刻的思考,提供有益的见解和方向,以帮助 navigate这个不断发展的领域,并确保评估既准确又反映生成能力。
https://arxiv.org/abs/2404.11539
Sentiment analysis (SA) aims to identify the sentiment expressed in a text, such as a product review. Given a review and the sentiment associated with it, this paper formulates SA as a combination of two tasks: (1) a causal discovery task that distinguishes whether a review "primes" the sentiment (Causal Hypothesis C1), or the sentiment "primes" the review (Causal Hypothesis C2); and (2) the traditional prediction task to model the sentiment using the review as input. Using the peak-end rule in psychology, we classify a sample as C1 if its overall sentiment score approximates an average of all the sentence-level sentiments in the review, and C2 if the overall sentiment score approximates an average of the peak and end sentiments. For the prediction task, we use the discovered causal mechanisms behind the samples to improve the performance of LLMs by proposing causal prompts that give the models an inductive bias of the underlying causal graph, leading to substantial improvements by up to 32.13 F1 points on zero-shot five-class SA. Our code is at this https URL
情感分析(SA)旨在识别文本中表达的情感,如产品评论。给定一个评论及其情感,本文将SA分解为两个任务:(1)一个因果发现任务,区分是否是评论“推动了”情感(因果假设C1),或者是情感“推动了”评论(因果假设C2);和(2)传统预测任务,使用评论作为输入来建模情感。根据心理学的峰值-端规则,我们将样本分类为C1,如果其整体情感得分约等于评论中所有句子级情感的平均值,则C1;如果整体情感得分约等于评论中峰值和结束情感的平均值,则C2。对于预测任务,我们使用发现样本背后的因果机制来提高LLM的性能,通过提出因果提示,使模型对潜在因果图具有归纳偏见,从而在零散五类SA上实现显著的改进,和改进后的F1得分达到32.13。我们的代码位于此链接:
https://arxiv.org/abs/2404.11055
The field of natural language processing (NLP) has made significant progress with the rapid development of deep learning technologies. One of the research directions in text sentiment analysis is sentiment analysis of medical texts, which holds great potential for application in clinical diagnosis. However, the medical field currently lacks sufficient text datasets, and the effectiveness of sentiment analysis is greatly impacted by different model design approaches, which presents challenges. Therefore, this paper focuses on the medical domain, using bidirectional encoder representations from transformers (BERT) as the basic pre-trained model and experimenting with modules such as convolutional neural network (CNN), fully connected network (FCN), and graph convolutional networks (GCN) at the output layer. Experiments and analyses were conducted on the METS-CoV dataset to explore the training performance after integrating different deep learning networks. The results indicate that CNN models outperform other networks when trained on smaller medical text datasets in combination with pre-trained models like BERT. This study highlights the significance of model selection in achieving effective sentiment analysis in the medical domain and provides a reference for future research to develop more efficient model architectures.
自然语言处理(NLP)领域在深度学习技术的快速发展中取得了显著的进步。文本情感分析的一个研究方向是医疗文本的情感分析,这在临床诊断中有很大的应用潜力。然而,目前医疗领域缺乏足够的文本数据,情感分析的效果受到不同模型设计方法的影响,这带来了挑战。因此,本文重点关注医疗领域,使用来自Transformer(BERT)双向编码器表示作为基本预训练模型,并尝试在输出层使用卷积神经网络(CNN)、全连接网络(FCN)和图卷积网络(GCN)等模块。在METS-CoV数据集上进行了实验和分析,以探索在整合不同深度学习网络后进行训练的性能。实验结果表明,当用较小的医疗文本数据集与预训练模型如BERT相结合训练时,CNN模型在其他网络中表现优异。本研究突出了在医疗领域实现有效情感分析的重要性,并为未来研究提供了参考,以开发更有效的模型架构。
https://arxiv.org/abs/2404.10503
With the growth of textual data across online platforms, sentiment analysis has become crucial for extracting insights from user-generated content. While traditional approaches and deep learning models have shown promise, they cannot often capture complex relationships between entities. In this paper, we propose leveraging Relational Graph Convolutional Networks (RGCNs) for sentiment analysis, which offer interpretability and flexibility by capturing dependencies between data points represented as nodes in a graph. We demonstrate the effectiveness of our approach by using pre-trained language models such as BERT and RoBERTa with RGCN architecture on product reviews from Amazon and Digikala datasets and evaluating the results. Our experiments highlight the effectiveness of RGCNs in capturing relational information for sentiment analysis tasks.
随着互联网上文本数据的快速增长,从用户生成内容中提取洞见的情味分析变得越来越重要。虽然传统方法和深度学习模型已经显示出巨大的潜力,但它们通常无法捕捉实体之间复杂的关系。在本文中,我们提出了一种利用关系图卷积网络(RGCNs)进行情感分析的方法,通过捕获数据点表示为图中节点的依赖关系来提供解释性和灵活性。我们通过使用预训练语言模型(如BERT和RoBERTa)的RGCN架构来评估亚马逊和Digikala数据集中的产品评论,以证明我们方法的有效性。我们的实验强调了RGCN在情感分析任务中捕获关系信息的有效性。
https://arxiv.org/abs/2404.13079
Large Language Models (LLMs) are already as persuasive as humans. However, we know very little about why. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments. Using a dataset of 1,251 participants in an experiment, we analyze the persuaion strategies of LLM-generated and human-generated arguments using measures of cognitive effort (lexical and grammatical complexity) and moral-emotional language (sentiment and moral analysis). The study reveals that LLMs produce arguments that require higher cognitive effort, exhibiting more complex grammatical and lexical structures than human counterparts. Additionally, LLMs demonstrate a significant propensity to engage more deeply with moral language, utilizing both positive and negative moral foundations more frequently than humans. In contrast with previous research, no significant difference was found in the emotional content produced by LLMs and humans. These findings contribute to the discourse on AI and persuasion, highlighting the dual potential of LLMs to both enhance and undermine informational integrity through communication strategies for digital persuasion.
大语言模型(LLMs)已经具有与人类相同的说服力。然而,我们对其原因的了解仍然非常有限。本文研究了LLMs的说服策略,将它们与人类生成的论据进行比较。通过一个由1,251名参与者组成的实验的数据集,我们使用认知努力(词汇和语法复杂性)和道德情感语言(情感和道德分析)来分析LLM生成的论据和人类生成的论据。研究发现,LLMs生成的论据需要更高的认知努力,表现出比人类更复杂的词汇和语法结构。此外,LLMs表明更倾向于与道德语言深入互动,比人类更频繁地利用积极和消极道德基础。与之前的研究相比,LLMs和人类在情感内容上没有显著差异。这些发现有助于人们对AI说服力和信息完整性进行讨论,突出了LLMs通过数字说服策略在增强和破坏信息完整性方面的双重潜力。
https://arxiv.org/abs/2404.09329