Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted for fine-tuning. However, such fine-tuning methods do not fully leverage other losses that are potentially beneficial for the target task. Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning. We formulate the auxiliary learning as a bi-level optimization problem and present an efficient optimization algorithm based on Approximate Implicit Differentiation (AID). For evaluation, we apply our framework to various video foundation models (UniVL, Violet and All-in-one), and show significant performance gain on all four downstream tasks: text-to-video retrieval, video question answering, video captioning, and multi-modal sentiment analysis. Our qualitative analyses demonstrate that MELTR adequately `transforms' individual loss functions and `melts' them into an effective unified loss. Code is available at this https URL.
基线模型在多个领域中表现出卓越的性能和泛化能力。由于大多数基线模型研究主要关注预训练阶段,因此一种简单的策略是最小化一个特定任务的损失,用于微调。然而, such微调方法并未充分利用可能对目标任务有益的其他损失。因此,我们提出了 Melta LossTRansformer(MELTR),它是一个插件模块,自动和非线性地组合各种损失函数,以协助通过辅助学习学习目标任务。我们将辅助学习表示为两个水平的优化问题,并提出了基于approximate Implicit differentiation(AID)的高效优化算法。为了评估,我们应用我们的框架对各种视频基线模型(UniVL、Violet和All-in-one)进行训练,并在所有四个后续任务中表现出显著的性能提升:文本到视频检索、视频问答、视频字幕和多模态情感分析。我们定性分析表明,MELTR适当地`transforms' individual损失函数,并将其`融化'为有效的统一损失。代码可在该 https URL 上获取。
https://arxiv.org/abs/2303.13009
This study evaluates the robustness of two state-of-the-art deep contextual language representations, ELMo and DistilBERT, on supervised learning of binary protest news classification and sentiment analysis of product reviews. A "cross-context" setting is enabled using test sets that are distinct from the training data. Specifically, in the news classification task, the models are developed on local news from India and tested on the local news from China. In the sentiment analysis task, the models are trained on movie reviews and tested on customer reviews. This comparison is aimed at exploring the limits of the representative power of today's Natural Language Processing systems on the path to the systems that are generalizable to real-life scenarios. The models are fine-tuned and fed into a Feed-Forward Neural Network and a Bidirectional Long Short Term Memory network. Multinomial Naive Bayes and Linear Support Vector Machine are used as traditional baselines. The results show that, in binary text classification, DistilBERT is significantly better than ELMo on generalizing to the cross-context setting. ELMo is observed to be significantly more robust to the cross-context test data than both baselines. On the other hand, the baselines performed comparably well to ELMo when the training and test data are subsets of the same corpus (no cross-context). DistilBERT is also found to be 30% smaller and 83% faster than ELMo. The results suggest that DistilBERT can transfer generic semantic knowledge to other domains better than ELMo. DistilBERT is also favorable in incorporating into real-life systems for it requires a smaller computational training budget. When generalization is not the utmost preference and test domain is similar to the training domain, the traditional ML algorithms can still be considered as more economic alternatives to deep language representations.
这项研究评估了ELMo和DistilBERT这两种最先进的深度学习上下文语言表示的稳健性,它们在监督学习二进制评论新闻分类和商品评论情感分析方面的表现。使用与训练数据不同的测试集,实现了一个“跨上下文”设置。具体而言,在新闻分类任务中,模型基于印度本地新闻和中国本地新闻开发,在情感分析任务中,模型基于电影评论训练,并在顾客评论中测试。这个比较旨在探索当今自然语言处理系统的代表作力的极限,以使其能够适用于实际场景。模型经过了优化,并输入到Feed-Forward神经网络和双向长期短期记忆网络中。多nomial Naive Bayes和线性支持向量机作为传统的基线。结果显示,在二进制文本分类中,DistilBERT在跨上下文 setting上的表现比ELMo更好。ELMo观察到其对跨上下文测试数据的稳定性比两个基线都强。另一方面,当训练和测试数据都是同一个语料库的子集(没有跨上下文)时,ELMo的表现与DistilBERT相当。DistilBERT也被发现比ELMo小30%,运行速度更快83%。结果显示,DistilBERT可以更好地将通用语义知识向其他领域转移,比ELMo更有效。DistilBERT也被认为更适合融入实际系统,因为它需要的计算训练预算较小。当泛化不是最优先考虑时,测试领域与训练领域相似,传统的机器学习算法仍然可以被视为深度学习表示的更经济的选择。
https://arxiv.org/abs/2303.12936
A positive phrase or a sentence with an underlying negative motive is usually defined as sarcasm that is widely used in today's social media platforms such as Facebook, Twitter, Reddit, etc. In recent times active users in social media platforms are increasing dramatically which raises the need for an automated NLP-based system that can be utilized in various tasks such as determining market demand, sentiment analysis, threat detection, etc. However, since sarcasm usually implies the opposite meaning and its detection is frequently a challenging issue, data meaning extraction through an NLP-based model becomes more complicated. As a result, there has been a lot of study on sarcasm detection in English over the past several years, and there's been a noticeable improvement and yet sarcasm detection in the Bangla language's state remains the same. In this article, we present a BERT-based system that can achieve 99.60\% while the utilized traditional machine learning algorithms are only capable of achieving 89.93\%. Additionally, we have employed Local Interpretable Model-Agnostic Explanations that introduce explainability to our system. Moreover, we have utilized a newly collected bangla sarcasm dataset, BanglaSarc that was constructed specifically for the evaluation of this study. This dataset consists of fresh records of sarcastic and non-sarcastic comments, the majority of which are acquired from Facebook and YouTube comment sections.
一个正面短语或句子背后存在消极动机通常被定义为 sarcastic,在今天的社交媒体平台上如Facebook、Twitter、Reddit等广泛使用。近年来,社交媒体平台上的活跃用户数量急剧增加,这导致了需要一种自动化的 NLP 系统,可以在各种任务中使用,例如确定市场需求、情绪分析、威胁检测等。然而,由于 sarcastic 往往意味着相反的含义,其检测经常是一个具有挑战性的问题,因此通过 NLP 模型提取数据含义变得更加复杂。因此,过去几年中,在英语中研究了 sarcastic 检测,取得了明显进展,然而孟加拉语中 sarcastic 检测仍然相同。在本文中,我们介绍了一种 BERT 系统,可以实现 99.60%,而使用的传统机器学习算法只能实现 89.93%。此外,我们采用了 local 解释模型无关的本地解释,引入我们的系统的可解释性。此外,我们使用了 newly collected孟加拉语 sarcastic 数据集 BanglaSarc,该数据集专门为评估本研究而构建。该数据集包括新鲜记录的 sarcastic 和非 sarcastic 评论,其中大多数评论是从 Facebook 和 YouTube 评论 sections 收集的。
https://arxiv.org/abs/2303.12772
Considering a conversation thread, stance classification aims to identify the opinion (e.g. agree or disagree) of replies towards a given target. The target of the stance is expected to be an essential component in this task, being one of the main factors that make it different from sentiment analysis. However, a recent study shows that a target-oblivious model outperforms target-aware models, suggesting that targets are not useful when predicting stance. This paper re-examines this phenomenon for rumour stance classification (RSC) on social media, where a target is a rumour story implied by the source tweet in the conversation. We propose adversarial attacks in the test data, aiming to assess the models robustness and evaluate the role of the data in the models performance. Results show that state-of-the-art models, including approaches that use the entire conversation thread, overly relying on superficial signals. Our hypothesis is that the naturally high occurrence of target-independent direct replies in RSC (e.g. "this is fake" or just "fake") results in the impressive performance of target-oblivious models, highlighting the risk of target instances being treated as noise during training.
考虑到对话线程,立场分类旨在识别回复对给定目标的个人观点(例如同意或不同意)。立场的目标应该在这个任务中被视为一个重要的组件,是使其与情感分析不同的主要因素之一。然而,一项最近的研究显示,目标忽略模型比目标意识到模型表现更好,这表明预测立场时目标并不非常有用。本文重新审视了社交媒体上的谣言立场分类(RSC)现象,其中目标是指对话源在对话中的谣言故事。我们提出了对抗攻击在测试数据中实施,旨在评估模型的鲁棒性和评估数据在模型性能中的作用。结果显示,最先进的模型,包括使用整个对话线程的方法,过度依赖表面信号。我们的假设是,在RSC中自然高的不受目标影响的直接回复(例如“这是假”或只是“假”)导致目标忽略模型令人印象深刻的表现,突出了目标实例在训练期间被当做噪声的风险。
https://arxiv.org/abs/2303.12665
We generated 25000 conversations labeled with Big Five Personality traits using prompt programming at GPT-3. Then we train Big Five classification models with these data and evaluate them with 2500 data from generated dialogues and real conversational datasets labeled in Big Five by human annotators. The results indicated that this approach is promising for creating effective training data. We then compare the performance by different training approaches and models. Our results suggest that using Adapter-Transformers and transfer learning from pre-trained RoBERTa sentiment analysis model will perform best with the generated data. Our best model obtained an accuracy of 0.71 in generated data and 0.65 in real datasets. Finally, we discuss this approach's potential limitations and confidence metric.
我们通过在GPT-3中prompt编程生成了25000次带有Big Five人格特征的对话,并将这些数据用于训练Big Five分类模型,同时使用人类标注的生成的对话和真实对话数据集来评估模型的性能。结果表明,这种方法对于生成有效的训练数据具有前景。然后,我们比较了不同的训练方法和模型的性能。我们的结果表明,使用Adapter-Transformers和从预训练的RoBERTa情感分析模型中学习的迁移学习将表现最佳。我们的最佳模型在生成的数据上获得了0.71的准确性,而在真实数据上获得了0.65的准确性。最后,我们讨论了这种方法的潜在限制和置信度度量。
https://arxiv.org/abs/2303.12279
The continuous improvement of human-computer interaction technology makes it possible to compute emotions. In this paper, we introduce our submission to the CVPR 2023 Competition on Affective Behavior Analysis in-the-wild (ABAW). Sentiment analysis in human-computer interaction should, as far as possible Start with multiple dimensions, fill in the single imperfect emotion channel, and finally determine the emotion tendency by fitting multiple results. Therefore, We exploited multimodal features extracted from video of different lengths from the competition dataset, including audio, pose and images. Well-informed emotion representations drive us to propose a Attention-based multimodal framework for emotion estimation. Our system achieves the performance of 0.361 on the validation dataset. The code is available at [this https URL].
不断进步的人机交互技术使得计算情感变得可能。在本文中,我们介绍了我们参加CVPR 2023比赛,关于在户外进行情感行为分析(ABAW)的横向比较研究。在人机交互中,情感分析应该尽可能从多个维度开始,填充一个不完美的情感通道,最后通过多项式结果的匹配来确定情感趋势。因此,我们利用从比赛数据集不同长度的视频中提取的多种模式特征,包括音频、姿势和图像。经过良好的情感表示驱动,我们提出了一种基于注意力的多种模式框架,用于情感估计。我们的系统在验证数据集上表现出0.361的性能。代码可在[这个https URL]获取。
https://arxiv.org/abs/2303.10421
Memes are the new-age conveyance mechanism for humor on social media sites. Memes often include an image and some text. Memes can be used to promote disinformation or hatred, thus it is crucial to investigate in details. We introduce Memotion 3, a new dataset with 10,000 annotated memes. Unlike other prevalent datasets in the domain, including prior iterations of Memotion, Memotion 3 introduces Hindi-English Codemixed memes while prior works in the area were limited to only the English memes. We describe the Memotion task, the data collection and the dataset creation methodologies. We also provide a baseline for the task. The baseline code and dataset will be made available at this https URL
弹幕是社交媒体平台上的新型娱乐传递机制,通常包括一张图片和一些文本。弹幕可以用来宣传虚假信息或仇恨,因此深入研究至关重要。我们介绍了 Memotion 3,一个包含10,000个注释弹幕的新数据集。与该领域其他流行的数据集(包括 Memotion 的先前版本)不同,Memotion 3引入了希伯来语-英语代码混合弹幕,而先前该地区的工作仅限于英语弹幕。我们描述了 Memotion 任务的数据收集和数据集创建方法。我们还提供了任务的基线代码和数据集。基线代码和数据集将在这个 https URL 上可用。
https://arxiv.org/abs/2303.09892
In this paper, we present our solutions to the two sub-challenges of Affective Behavior Analysis in the wild (ABAW) 2023: the Emotional Reaction Intensity (ERI) Estimation Challenge and Expression (Expr) Classification Challenge. ABAW 2023 focuses on the problem of affective behavior analysis in the wild, with the goal of creating machines and robots that have the ability to understand human feelings, emotions and behaviors, which can effectively contribute to the advent of a more intelligent future. In our work, we use different models and tools for the Hume-Reaction dataset to extract features of various aspects, such as audio features, video features, etc. By analyzing, combining, and studying these multimodal features, we effectively improve the accuracy of the model for multimodal sentiment prediction. For the Emotional Reaction Intensity (ERI) Estimation Challenge, our method shows excellent results with a Pearson coefficient on the validation dataset, exceeding the baseline method by 84 percent.
在本文中,我们介绍了我们对在野外进行的Affective Behavior Analysis两个子挑战的解决方案:情感反应强度(ERI)估计挑战和表达(Expr)分类挑战。ABAW 2023重点探讨在野外进行情感行为分析的问题,旨在创造能够理解人类情感、情绪和行为的机器和机器人,从而有效地为更智能的未来做出贡献。在我们的工作中,我们使用 Hume-反应数据集的不同模型和工具提取不同方面的特征,例如音频特征、视频特征等。通过分析、组合和研究这些多模式特征,我们有效地提高了多模式情感预测模型的精度。对于情感反应强度(ERI)估计挑战,我们的方法和验证数据集上的Pearson系数显示出非常好的结果,超过基准方法的84%。
https://arxiv.org/abs/2303.09164
Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre-trained models. We further propose two more LIFA variants that encourage the pre-trained models to either cooperate or compete with one another. Studying these variants sheds light on the success of LIFA by showing that sharing knowledge among the models is more beneficial for transfer learning. Moreover, we construct the AISIA-VN-Review-F dataset, the first large-scale Vietnamese sentiment classification database. We conduct extensive experiments on the AISIA-VN-Review-F and existing benchmarks to demonstrate the efficacy of LIFA compared to other techniques. To contribute to the Vietnamese NLP research, we publish our source code and datasets to the research community upon acceptance.
迁移学习在深度学习中发挥着至关重要的作用,它能够显著改善训练数据不足的目标领域的性能。我们的研究超越了通常使用的迁移学习单模型实践。我们专注于越南情绪分类任务,并提出了LIFA框架,一种从多个预训练模型学习统一嵌入的框架。我们还提出了两个LIFA变体,鼓励预训练模型相互合作或竞争。研究这些变体有助于阐明LIFA的成功,表明模型之间共享知识对于迁移学习更有益。此外,我们创建了AISIA-VN-Review-F数据集,它是越南情绪分类领域第一个大规模数据库。我们对AISIA-VN-Review-F和现有基准进行了广泛的实验,以证明LIFA相对于其他技术的有效性。为了向越南自然语言处理研究做出贡献,我们在接受后将我们的源代码和数据集公开给学术界。
https://arxiv.org/abs/2303.09115
Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.
情感分类是自然语言处理领域的一个基本任务,具有重要的学术和商业应用。其目的是自动预测包含一定程度的 opinions 和主观性的文本的情感程度,例如产品影评和推特。这确实非常困难,部分原因是不同文本域包含不同的单词和表达方式。此外,当文本写为非英语语言时,由于缺乏数据库和资源,难度会增加。因此,经常将该任务与跨域和跨语言技术应用于以提高结果。在本研究中,我们对使用大型产品评论数据库训练的分类系统的能力进行了研究。从七个拉丁美洲国家收集了从 MercadoLibre 网站上收集的评论,创建了一个大型且平衡的数据集。结果显示,尽管使用这些产品评论进行训练非常困难,但跨域 generalization 是可行的,可以通过预先训练和优化分类模型来改进。
https://arxiv.org/abs/2303.08985
Long-sequence transformers are designed to improve the representation of longer texts by language models and their performance on downstream document-level tasks. However, not much is understood about the quality of token-level predictions in long-form models. We investigate the performance of such architectures in the context of document classification with unsupervised rationale extraction. We find standard soft attention methods to perform significantly worse when combined with the Longformer language model. We propose a compositional soft attention architecture that applies RoBERTa sentence-wise to extract plausible rationales at the token-level. We find this method to significantly outperform Longformer-driven baselines on sentiment classification datasets, while also exhibiting significantly lower runtimes.
长序列转换器的设计目的是通过语言模型提高更长文本的表示和下游文档级任务的性能。然而,对于长形式模型中 token-level 的预测质量,了解并不多。我们在无监督文档分类上下文中研究了这种架构的性能。我们发现,与 Longformer 语言模型组合时,标准软注意力方法的性能显著下降。我们提出了一个组成性的软注意力架构,应用 RoBERTa 语句级方法从 token-level 中提取合理的理由。我们发现,这种方法在情感分类数据集上显著优于 Longformer 驱动的基准模型,同时表现出显著更低的运行时间。
https://arxiv.org/abs/2303.07991
I propose a novel dual-attention model(DAM) for aspect-level sentiment classification. Many methods have been proposed, such as support vector machines for artificial design features, long short-term memory networks based on attention mechanisms, and graph neural networks based on dependency parsing. While these methods all have decent performance, I think they all miss one important piece of syntactic information: dependency labels. Based on this idea, this paper proposes a model using dependency labels for the attention mechanism to do this task. We evaluate the proposed approach on three datasets: laptop and restaurant are from SemEval 2014, and the last one is a twitter dataset. Experimental results show that the dual attention model has good performance on all three datasets.
我提出了一种新的多关注模型(DAM),用于 aspect-level 情感分类。已经提出了许多方法,如支持向量机用于人工设计特征、基于注意力机制的长期短期记忆网络,以及基于依赖解析的图神经网络。虽然这些方法都取得了不错的性能,但我觉得它们都缺少一个重要的语法信息:依赖标签。基于这个想法,本文提出了一种使用依赖标签的注意力机制模型来完成这个任务。我们评估了三种数据集:笔记本电脑和餐厅来自SemEval 2014,最后一个数据集是推特数据集。实验结果表明,双关注模型在所有三个数据集上取得了良好的性能。
https://arxiv.org/abs/2303.07689
The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such as Bidirectional Encoder Representations from Transformers (BERT), Robustly Optimized BERT Pre-training Approach (RoBERTa), a distilled version of BERT (DistilBERT), and a large bidirectional neural network architecture (XLNet) were proposed. The performance of the four models that were used to detect disaster in the text was compared. All the models performed well enough, indicating that transformer-based models are suitable for the detection of disaster in text. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions. Furthermore, we discovered that the learning algorithms' performance was influenced by the pre-processing techniques, the nature of words in the vocabulary, unbalanced labeling, and the model parameters.
使用转移学习方法在很大程度上导致了多个领域的自然学习处理任务的突破。为了解决情感检测问题,我们对四种著名的Transformer模型文本分类的性能进行了比较。例如,提出了Bidirectional Encoder Representations from Transformers (BERT)、 robustly optimized BERT Pre-training Approach (RoBERTa)、从BERT中提取的蒸馏版本(DistilBERT)以及大型双向神经网络架构(XLNet)。用于检测文本灾难的四个模型的性能进行了比较。所有模型表现都足够好,这表明基于Transformer的模型适合在文本中检测灾难。RoBERTa Transformer模型在测试数据集上表现最佳,得分为82.6%,并强烈推荐用于高质量的预测。此外,我们发现,预处理技术、词汇表中的单词性质、不平衡标签以及模型参数会影响学习算法的性能。
https://arxiv.org/abs/2303.07292
Two studies tested the hypothesis that a Large Language Model (LLM) can be used to model psychological change following exposure to influential input. The first study tested a generic mode of influence - the Illusory Truth Effect (ITE) - where earlier exposure to a statement (through, for example, rating its interest) boosts a later truthfulness test rating. Data was collected from 1000 human participants using an online experiment, and 1000 simulated participants using engineered prompts and LLM completion. 64 ratings per participant were collected, using all exposure-test combinations of the attributes: truth, interest, sentiment and importance. The results for human participants reconfirmed the ITE, and demonstrated an absence of effect for attributes other than truth, and when the same attribute is used for exposure and test. The same pattern of effects was found for LLM-simulated participants. The second study concerns a specific mode of influence - populist framing of news to increase its persuasion and political mobilization. Data from LLM-simulated participants was collected and compared to previously published data from a 15-country experiment on 7286 human participants. Several effects previously demonstrated from the human study were replicated by the simulated study, including effects that surprised the authors of the human study by contradicting their theoretical expectations (anti-immigrant framing of news decreases its persuasion and mobilization); but some significant relationships found in human data (modulation of the effectiveness of populist framing according to relative deprivation of the participant) were not present in the LLM data. Together the two studies support the view that LLMs have potential to act as models of the effect of influence.
两项研究测试了假设,即大型语言模型(LLM)可以用来模拟受到有影响力的输入影响后的心理变化。第一项研究测试了一种影响的模式,即幻觉的真实性效应(ITE),该效应是指在早期接触一个陈述(例如评估它的 interest 程度)会增强后来的真实性测试评级。数据是通过在线实验从1000名人类参与者中收集的,以及通过工程刺激和 LLM 完成从1000名模拟参与者中收集的。每个参与者都收到了64个评级,使用了所有暴露和测试的属性组合:真相、兴趣、情感和重要性。人类参与者的结果再次证实了 ITE,并证明了除了真相的其他属性没有影响,而且当使用相同的属性进行暴露和测试时也没有。相同的影响模式也存在于 LLM 模拟的参与者中。第二项研究关注一种特定的影响模式,即民粹主义 framing 新闻来提高其说服和政治动员效果。从 LLM 模拟参与者中收集的数据与之前在15国实验中公开发表的数据进行了比较。之前从人类研究中证明的一些效果被模拟研究重复了,包括令人惊讶地震惊人类研究作者的理论期望的效果(反移民 framing 新闻会降低其说服和动员效果)。但是,一些在人类数据中发现的重要关系(根据参与者相对剥夺的 Modifiable 民粹主义 framing 效果的变化)在 LLM 数据中不存在。两项研究共同支持观点,即LLM 有潜力作为影响效应的模型。
https://arxiv.org/abs/2303.06074
Today, the web has become a mandatory platform to express users' opinions, emotions and feelings about various events. Every person using his smartphone can give his opinion about the purchase of a product, the occurrence of an accident, the occurrence of a new disease, etc. in blogs and social networks such as (Twitter, WhatsApp, Telegram and Instagram) register. Therefore, millions of comments are recorded daily and it creates a huge volume of unstructured text data that can extract useful knowledge from this type of data by using natural language processing methods. Sentiment analysis is one of the important applications of natural language processing and machine learning, which allows us to analyze the sentiments of comments and other textual information recorded by web users. Therefore, the analysis of sentiments, approaches and challenges in this field will be explained in the following.
当今,互联网已经成为表达用户对各种事件的意见、情感和感受的必须平台。使用手机博客和社交媒体(如推特、WhatsApp、Telegram和Instagram)注册的每个人都可以在这些平台上表达对产品购买、事故发生、新疾病爆发等事件的意见。因此,每天有数百万条评论被记录,形成了大量未结构化的文本数据,可以利用自然语言处理方法从这些数据中提取有用的知识。情感分析是自然语言处理和机器学习的一个重要应用,使我们能够分析Web用户记录的评论和其他文本信息的情感。因此,将在后面解释该领域的情感分析、方法和挑战。
https://arxiv.org/abs/2303.11176
The goal of Speech Emotion Recognition (SER) is to enable computers to recognize the emotion category of a given utterance in the same way that humans do. The accuracy of SER is strongly dependent on the validity of the utterance-level representation obtained by the model. Nevertheless, the ``dark knowledge" carried by non-target classes is always ignored by previous studies. In this paper, we propose a hierarchical network, called DKDFMH, which employs decoupled knowledge distillation in a deep convolutional neural network with a fused multi-head attention mechanism. Our approach applies logit distillation to obtain higher-level semantic features from different scales of attention sets and delve into the knowledge carried by non-target classes, thus guiding the model to focus more on the differences between sentiment features. To validate the effectiveness of our model, we conducted experiments on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) dataset. We achieved competitive performance, with 79.1% weighted accuracy (WA) and 77.1% unweighted accuracy (UA). To the best of our knowledge, this is the first time since 2015 that logit distillation has been returned to state-of-the-art status.
语音情感识别的目标在于使计算机以人类相同的方式识别给定发言的情感类别。 SER 的准确性在很大程度上取决于模型获得的发言级别表示的 validity。然而,非目标类别的“黑暗知识”总是被以前的研究忽略。在本文中,我们提出了一种层级网络,称为 DKDFMH,它使用分离的知识蒸馏在具有融合多眼注意力机制的深层卷积神经网络中应用。我们的 approach 应用 logit 蒸馏来从不同尺度的注意力集合中提取更高级别的语义特征,并深入非目标类别携带的知识,从而指导模型更多地关注情感特征之间的差异。为了验证我们的模型的有效性,我们研究了交互情感三维运动捕捉(IEMOCAP)数据集。我们取得了竞争的表现,加权准确性为 (WA) 79.1%,非加权准确性为 (UA) 77.1%。据我们所知,这是自 2015 年以来第一次 logit 蒸馏回到了前沿地位。
https://arxiv.org/abs/2303.05134
We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their attribute values are out of the training distribution, posing challenges to existing methods that aim to directly generate the target sequence. Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. We train the model on synthetically generated sequence pairs that demonstrate small improvement in the attribute value. Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity. Our code and models are available at: this https URL.
我们研究的是扩展控制生成问题,也就是在训练范围内生成属性值超出范围序列的问题。这在自动化设计特别是在药物发现中非常重要,因为的目标是设计比现有序列更好的新蛋白质(例如,更稳定的),因此,根据定义,目标序列和其属性值超出了训练分布的范围,给试图直接生成目标序列的方法带来了挑战。相反,在本文中,我们提出了迭代控制的扩展生成(ICE),该方法迭代地对序列进行局部编辑,以进行扩展。我们训练了合成生成的序列对,这些序列证明了属性值微小的改进。在一个自然语言任务(情感分析)和两个蛋白质工程任务(ACE2稳定性和AAV fitness)中的结果表明,ICE显著优于现有方法,尽管其简单性。我们的代码和模型可在以下httpsURL获取:
https://arxiv.org/abs/2303.04562
Detecting human-object interactions (HOIs) is a challenging problem in computer vision. Existing techniques for HOI detection heavily rely on appearance-based features, which may not capture other essential characteristics for accurate detection. Furthermore, the use of transformer-based models for sentiment representation of human-object pairs can be computationally expensive. To address these challenges, we propose a novel graph-based approach, SKGHOI (Spatial-Semantic Knowledge Graph for Human-Object Interaction Detection), that effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge. In a graph, SKGHOI takes the components of interaction as nodes, and the spatial relationships between them as edges. Our approach employs a spatial encoder and a semantic encoder to extract spatial and semantic information, respectively, and then combines these encodings to create a knowledge graph that captures the sentiment representation of HOIs. Compared to existing techniques, SKGHOI is computationally efficient and allows for the incorporation of prior knowledge, making it practical for use in real-world applications. We demonstrate the effectiveness of our proposed method on the widely-used HICO-DET datasets, where it outperforms existing state-of-the-art graph-based methods by a significant margin. Our results indicate that the SKGHOI approach has the potential to significantly improve the accuracy and efficiency of HOI detection, and we anticipate that it will be of great interest to researchers and practitioners working on this challenging task.
检测人类-物体相互作用(HOIs)是计算机视觉中的一道挑战性问题。现有的HOI检测技术很大程度上依赖于外观特征,这些特征可能无法准确检测其他重要特征。此外,使用基于Transformer的模型对人类-物体对的情感表示可以计算代价很高。为了应对这些挑战,我们提出了一种独特的图based方法,SKGHOI(Spatial-Semantic Knowledge Graph for Human-Object Interaction Detection),该方法通过整合空间和情感知识有效地捕获了HOI的情感表示。在一个图中,SKGHOI将交互组件作为节点,将它们的空间关系作为边。我们的方法使用空间编码器和情感编码器分别提取空间和情感信息,然后将它们编码在一起,创建一个知识图,以捕获HOI的情感表示。与现有的技术相比,SKGHOI的计算效率更高,并允许纳入先前的知识,使其适用于实际应用。我们使用HICO-DET等广泛应用的 datasets 来证明我们提出的方法的有效性,在该任务中它比现有的最先进的图based方法表现更好。我们的结果表明,SKGHOI方法有潜力 significantly 改善HOI检测的精度和效率,我们预计,它将对研究人员和从业者从事这一挑战性任务非常感兴趣。
https://arxiv.org/abs/2303.04253
The literature on aspect-based sentiment analysis (ABSA) has been overwhelmed by deep neural networks, yielding state-of-the-art results for ABSA. However, these deep models are susceptible to learning spurious correlations between input features and output labels, which in general suffer from poor robustness and generalization. In this paper, we propose a novel Contrastive Variational Information Bottleneck framework (called CVIB) to reduce spurious correlations for ABSA. The proposed CVIB framework is composed of an original network and a self-pruned network, and these two networks are optimized simultaneously via contrastive learning. Concretely, we employ the Variational Information Bottleneck (VIB) principle to learn an informative and compressed network (self-pruned network) from the original network, which discards the superfluous patterns or spurious correlations between input features and prediction labels. Then, self-pruning contrastive learning is devised to pull together semantically similar positive pairs and push away dissimilar pairs, where the representations of the anchor learned by the original and self-pruned networks respectively are regarded as a positive pair while the representations of two different sentences within a mini-batch are treated as a negative pair. Extensive experiments on five benchmark ABSA datasets demonstrate that our CVIB method achieves better performance than the strong competitors in terms of overall prediction performance, robustness, and generalization.
面向属性的情感分析文献已经受到深度神经网络的淹没,取得了ABSA领域最先进的结果。然而,这些深度模型容易学习输入特征和输出标签之间的伪相关,这通常会导致 poor 的鲁棒性和泛化能力。在本文中,我们提出了一种全新的对比变异信息瓶颈框架(称为CVIB),以减少ABSA中的伪相关。 proposed CVIB框架由原始网络和自我修剪网络组成,这两个网络同时通过对比学习优化。具体来说,我们采用了Variational Information Bottleneck (VIB)原则从原始网络中学习一个 informative 且压缩的网络(自我修剪网络),该网络抛弃了输入特征和预测标签之间的多余的模式或伪相关。然后,自我修剪对比学习旨在将语义上相似的正对和排斥不相似的负对结合起来,其中原始和自我修剪网络分别学习的锚的表示被视为正对,而在一个迷你批量中两个不同句子的表示被视为负对。对五个基准ABSA数据集的广泛实验表明,我们的CVIB方法在整体预测性能、鲁棒性和泛化能力方面比强大的竞争对手表现更好。
https://arxiv.org/abs/2303.02846
This paper presents a novel approach for explainability in financial analysis by utilizing the Pearson correlation coefficient to establish a relationship between aspect-based sentiment analysis and stock prices. The proposed methodology involves constructing an aspect list from financial news articles and analyzing sentiment intensity scores for each aspect. These scores are then compared to the stock prices for the relevant companies using the Pearson coefficient to determine any significant correlations. The results indicate that the proposed approach provides a more detailed and accurate understanding of the relationship between sentiment analysis and stock prices, which can be useful for investors and financial analysts in making informed decisions. Additionally, this methodology offers a transparent and interpretable way to explain the sentiment analysis results and their impact on stock prices. Overall, the findings of this paper demonstrate the importance of explainability in financial analysis and highlight the potential benefits of utilizing the Pearson coefficient for analyzing aspect-based sentiment analysis and stock prices. The proposed approach offers a valuable tool for understanding the complex relationships between financial news sentiment and stock prices, providing a new perspective on the financial market and aiding in making informed investment decisions.
本论文提出了一种崭新的财务分析解释方法,利用皮尔森相关系数建立基于角度的情感分析与股票价格之间的关系。该方法涉及从财务新闻文章中构建角度列表,并对每个角度的情感强度得分进行分析。这些得分后与相关公司的股票价格使用皮尔森系数进行比较,以确定任何明显的相关关系。结果表明,该方法提供了更加详细和准确的理解情感分析和股票价格之间的关系,这对投资者和金融分析师在做出知情决策时非常有用。此外,该方法提供了一种透明和可解释的方式来解释情感分析结果及其对股票价格的影响。总而言之,本文的结论表明财务分析中解释的重要性,并突出了使用皮尔森系数分析基于角度的情感分析和股票价格的潜在好处。该方法提供了一个宝贵的工具,以理解金融新闻情感和股票价格之间的复杂关系,提供了对市场的新视角,并帮助做出知情的投资决策。
https://arxiv.org/abs/2303.02563