This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.
本论文介绍了BEFL,一个关键主题的大脑到语言翻译研究的创新模型和学习框架。将非侵入性脑信号翻译为可读的自然语言有潜力促进应用场景和提高整个脑-计算机接口(BCI)的发展。脑信号解码或脑到语言翻译的关键问题是从规模有限且质量有限的数据集中提取语义合适的、具有区分性的EEG表示。 proposed BEFL方法是一种通用且高效的框架,使用现有的大型预训练语言模型(LMs)BootstrapEEG表示学习。大型LM的能力能够理解语义信息和零样本泛化,BEFL利用在Internet上训练的大型LMs来显著改善对EEG信号的理解。特别是,BEFL模型由一个深度 conformer编码器和一个向量点编码器组成。语义EEG表示是通过比较学习步骤提供自然语言监督实现的。我们在两个关键的大脑解码任务中实现了最先进的结果,包括脑到语言翻译和零样本情感分类。具体来说,我们的模型在两个任务上超过了基准模型,分别超过5.45%和10%。在翻译和零样本情感分类的主要评估指标上,我们实现了42.31%的BLEU-1得分和67.32%的精度。
https://arxiv.org/abs/2309.12056
With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
随着大数据和计算设备的迅速发展,基于实时信息获取的低延迟自动交易平台已成为股票市场的主要组成部分,因此量化交易话题已经引起了广泛的关注和重视。对于不太高效的交易市场,人类情感和期望总是占据市场趋势和交易决策的主导地位。因此,本文从情感理论入手,以东方财富为例,从对应股票 bar 的用户评论标题数据爬取用户评论数据并进行数据清洗。随后,自然语言处理模型 BERT 被构建出来,并使用现有的标注数据集进行微调。实验结果显示,微调模型的性能改进程度与原始模型和基准模型不同。随后,基于上述模型,用户评论数据爬行被标记为情感极性,并将所获得的标签信息与 Alpha191 模型结合,用于回归,并获得了显著的回归结果。随后,回归模型被用于预测未来五天的平均价格波动,并将其用作自动交易的信号指导。实验结果显示,将情感因素融入量化交易会增加回报率,相比基准期间提高了 73.8%,而与原始 Alpha191 模型相比则提高了 32.41%。最后,本文讨论了将情感因素融入量化交易的优点和缺点,并提供了未来研究的可能方向。
https://arxiv.org/abs/2309.11979
The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm that the attention-based information can be effectively used for data augmentation in the NLP domain.
混合方法在计算机视觉中已经证明是一种强大的数据增强技术,有许多后续方法,都能够以引导的方式进行图像混合。其中一个有趣的研究方向是将混合方法的基础 idea 转移到其他领域,例如自然语言处理(NLP)。尽管已经存在几种方法将混合方法应用于文本数据,但仍有改进的空间。在本研究中,我们介绍了注意力混合方法,这是一种基于注意力的信息混合方法。尽管论文主要关注BERT注意力机制,但 proposed 方法可以适用于任何基于注意力的模型。注意力混合在3个标准情感分类数据集上进行评估,在所有三个情况下都比使用混合机制的两个基准方法以及传统的BERT方法表现更好。结果确认了基于注意力的信息在NLP领域中可以 effectively 用于数据增强。
https://arxiv.org/abs/2309.11104
In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.
上下文学习(ICL)使用大型语言模型进行多项标签任务的挑战是由于上下文窗口有限,这使得很难在提示中装入足够的示例。在本文中,我们使用预训练的密集检索模型绕过了这种限制,为模型只提供了每个推理调用的完整标签空间的 partial 视图。与最近的开源LLM(OPT、LLaMA)进行测试,我们为三种常见的意图分类数据集在少量样本设置中创造了新的顶尖性能,而无需微调。我们还在某些情况下超越了微调性能。我们对上下文中的示例数量和不同模型规模的性能进行分析,表明大型模型是必要的,以便有效地、一致性地利用更大的上下文长度来进行ICL。通过运行几个析因函数,我们分析了模型使用的内容:a) 上下文中的示例与当前输入的相似性,b) 类名称语义内容,以及c) 示例和标签的正确对应关系。我们证明了,这三个方面都是必不可少的,取决于领域,与某些最近的工作相反。
https://arxiv.org/abs/2309.10954
As impact of COVID-19 pandemic winds down, both individuals and society gradually return to pre-pandemic activities. This study aims to explore how people's emotions have changed from the pre-pandemic during the pandemic to post-emergency period and whether it has returned to pre-pandemic level. We collected Reddit data in 2019 (pre-pandemic), 2020 (peak pandemic), 2021, and 2022 (late stages of pandemic, transitioning period to post-emergency period) from subreddits in 128 universities/colleges in the U.S., and a set of school-level characteristics. We predicted two sets of sentiments from a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) and graph attention network (GAT) that leverages both rich semantic and relational information among posted messages and then applied a logistic stacking method to obtain the final sentiment classification. After obtaining sentiment label for each message, we used a generalized linear mixed-effects model to estimate temporal trend in sentiment from 2019 to 2022 and how school-level factors may affect sentiment. Compared to the year 2019, the odds of negative sentiment in years 2020, 2021, and 2022 are 24%, 4.3%, and 10.3% higher, respectively, which are all statistically significant(adjusted $p$<0.05). Our study findings suggest a partial recovery in the sentiment composition in the post-pandemic-emergency era. The results align with common expectations and provide a detailed quantification of how sentiments have evolved from 2019 to 2022.
随着新冠病毒大流行的影响逐渐减弱,个人和社会逐渐回到了 pre-pandemic 的活动。本研究旨在探索人们的情绪从大流行期间 pre-pandemic 到 post-Emergency 时期的变化,以及是否已经回到了 pre-pandemic 的水平。我们从在美国128所大学/学院的 Reddit 论坛中收集了 2019 年( pre-pandemic)、2020 年(大流行的巅峰)、2021 年和 2022 年(大流行后期、过渡时期到 post-Emergency 时期)的 180 多条帖子的数据,并收集了一些学校的特点。我们使用预先训练的鲁棒优化的BERT预训练方法(RoBERTa)和图形注意力网络(GAT)来预测两个情绪类别,这些情绪类别利用发布帖子中丰富的语义和关系信息,然后应用 Logistic 堆叠方法来得到最终的情绪分类。在获得每个消息的情绪标签后,我们使用通用线性混合效应模型来估计从 2019 年到 2022 年的情绪趋势,以及学校因素可能如何影响情绪。与 2019 年相比,2020、2021 和 2022 年负面情绪的事件发生率分别提高了 24%、4.3 % 和 10.3%,它们都具有统计学意义(调整 $p$<0.05)。我们的研究结果表明, post-pandemic-Emergency 时期的情绪组成 partial 恢复了。结果与普遍期望一致,提供了详细的量化情绪从 2019 年到 2022 年的变化。
https://arxiv.org/abs/2309.08845
Sentiment analysis is a pivotal task in the domain of natural language processing. It encompasses both text-level sentiment polarity classification and word-level Part of Speech(POS) sentiment polarity determination. Such analysis challenges models to understand text holistically while also extracting nuanced information. With the rise of Large Language Models(LLMs), new avenues for sentiment analysis have opened. This paper proposes enhancing performance by leveraging the Mutual Reinforcement Effect(MRE) between individual words and the overall text. It delves into how word polarity influences the overarching sentiment of a passage. To support our research, we annotated four novel Sentiment Text Classification and Part of Speech(SCPOS) datasets, building upon existing sentiment classification datasets. Furthermore, we developed a Universal Sentiment Analysis(USA) model, with a 7-billion parameter size. Experimental results revealed that our model surpassed the performance of gpt-3.5-turbo across all four datasets, underscoring the significance of MRE in sentiment analysis.
情感分析是自然语言处理领域的关键任务,涵盖了文本级别的情感极性分类和单词级别的部分语用(POS)情感极性 determination。这种分析挑战模型既要理解文本的整体情况,又要提取微妙的信息。随着大型语言模型(LLMs)的崛起,情感分析也开辟了新的途径。本文提出通过利用个体单词和整个文本之间的互相强化效应来增强性能。它探讨了单词极性如何影响一段文本的总体情感。为了支持我们的研究,我们注释了四个全新的情感文本分类和部分语用(SCPOS)数据集,建立在现有的情感分类数据集上。此外,我们开发了一台通用的情感分析模型,参数规模为70亿。实验结果显示,我们的模型在所有四个数据集上超过了GPT-3.5-Turbo的性能,强调了在情感分析中MRE的重要性。
https://arxiv.org/abs/2309.03787
The successful application of large pre-trained models such as BERT in natural language processing has attracted more attention from researchers. Since the BERT typically acts as an end-to-end black box, classification systems based on it usually have difficulty in interpretation and low robustness. This paper proposes a visual interpretation-based self-improving classification model with a combination of virtual adversarial training (VAT) and BERT models to address the above problems. Specifically, a fine-tuned BERT model is used as a classifier to classify the sentiment of the text. Then, the predicted sentiment classification labels are used as part of the input of another BERT for spam classification via a semi-supervised training manner using VAT. Additionally, visualization techniques, including visualizing the importance of words and normalizing the attention head matrix, are employed to analyze the relevance of each component to classification accuracy. Moreover, brand-new features will be found in the visual analysis, and classification performance will be improved. Experimental results on Twitter's tweet dataset demonstrate the effectiveness of the proposed model on the classification task. Furthermore, the ablation study results illustrate the effect of different components of the proposed model on the classification results.
大型预训练模型如BERT在自然语言处理领域的成功应用引起了研究人员的更多关注。由于BERT通常作为全端黑盒,其基于它的分类系统通常具有难以解释和低鲁棒性。本文提出了一种基于视觉解释的自我改进分类模型,结合虚拟对抗训练(VAT)和BERT模型来解决上述问题。具体来说,优化后的BERT模型被用作分类器,以对文本的情感分类进行分类。然后,使用预测的情感分类标签,将其作为另一个BERT的输入,以进行半监督训练,通过VAT进行垃圾邮件分类。此外,可视化技术,包括可视化单词的重要性和标准化注意力头矩阵,被用于分析每个组件与分类准确性的相关性。此外,在Twitter的推特数据集上的实验结果证明,该模型在分类任务上的 effectiveness。此外,断点研究结果还说明了该模型不同组件对分类结果的影响。
https://arxiv.org/abs/2309.01196
Aspect-based sentiment classification is a crucial problem in fine-grained sentiment analysis, which aims to predict the sentiment polarity of the given aspect according to its context. Previous works have made remarkable progress in leveraging attention mechanism to extract opinion words for different aspects. However, a persistent challenge is the effective management of semantic mismatches, which stem from attention mechanisms that fall short in adequately aligning opinions words with their corresponding aspect in multi-aspect sentences. To address this issue, we propose a novel Aspect-oriented Opinion Alignment Network (AOAN) to capture the contextual association between opinion words and the corresponding aspect. Specifically, we first introduce a neighboring span enhanced module which highlights various compositions of neighboring words and given aspects. In addition, we design a multi-perspective attention mechanism that align relevant opinion information with respect to the given aspect. Extensive experiments on three benchmark datasets demonstrate that our model achieves state-of-the-art results. The source code is available at this https URL.
基于属性的情感分类是精细情感分析中的关键问题,其目标是根据上下文预测给定属性的情感极性。以前的研究已经在利用注意力机制提取不同属性的意见词汇方面取得了显著进展。然而,一个长期的挑战是有效管理语义不匹配,这源于注意力机制在多属性句子中未能充分对齐意见词汇与相应的属性。为了解决这一问题,我们提出了一种新的基于属性的意见对齐网络(AOAN),以捕捉意见词汇与相应属性的上下文关联。具体来说,我们首先引入了相邻跨度增强模块,以突出相邻词汇和给定属性的各种组合。此外,我们设计了多个视角的注意力机制,以对齐与给定属性相关的相关意见信息。对三个基准数据集进行的广泛实验表明,我们的模型取得了最先进的结果。源代码可以在这个httpsURL上获取。
https://arxiv.org/abs/2308.11447
Financial sentiment analysis plays a crucial role in decoding market trends and guiding strategic trading decisions. Despite the deployment of advanced deep learning techniques and language models to refine sentiment analysis in finance, this study breaks new ground by investigating the potential of large language models, particularly ChatGPT 3.5, in financial sentiment analysis, with a strong emphasis on the foreign exchange market (forex). Employing a zero-shot prompting approach, we examine multiple ChatGPT prompts on a meticulously curated dataset of forex-related news headlines, measuring performance using metrics such as precision, recall, f1-score, and Mean Absolute Error (MAE) of the sentiment class. Additionally, we probe the correlation between predicted sentiment and market returns as an additional evaluation approach. ChatGPT, compared to FinBERT, a well-established sentiment analysis model for financial texts, exhibited approximately 35\% enhanced performance in sentiment classification and a 36\% higher correlation with market returns. By underlining the significance of prompt engineering, particularly in zero-shot contexts, this study spotlights ChatGPT's potential to substantially boost sentiment analysis in financial applications. By sharing the utilized dataset, our intention is to stimulate further research and advancements in the field of financial services.
财务情绪分析在解码市场趋势和指导战略 trading 决策中发挥着至关重要的作用。尽管应用了先进的深度学习技术和语言模型来改进 finance 中的情绪分析,但本研究通过研究大型语言模型的潜力,特别是 ChatGPT 3.5,在财务情绪分析中突出了这一点,并以外汇市场(外汇)为主要研究对象。采用零起点提示方法,我们对一个精心整理的与外汇相关的新闻标题数据集进行了多次 ChatGPT 提示的探究,使用精度、召回率、f1 得分和情绪类别的均值绝对误差(MAE)等指标来衡量表现。此外,我们还探索了预测情绪和市场走势之间的关系作为额外的评估方法。与 FinBERT 等金融文本情绪分析模型相比,ChatGPT 在情绪分类方面表现出了约 35% 的提高表现,与市场回报的相关性提高了 36%。通过强调及时的工程在零起点环境中的重要性,本研究突出了 ChatGPT 在财务应用程序中的重大潜力,通过分享使用数据集,旨在刺激进一步的财务服务研究和发展。
https://arxiv.org/abs/2308.07935
The joint task of Dialog Sentiment Classification (DSC) and Act Recognition (DAR) aims to predict the sentiment label and act label for each utterance in a dialog simultaneously. However, current methods encode the dialog context in only one direction, which limits their ability to thoroughly comprehend the context. Moreover, these methods overlook the explicit correlations between sentiment and act labels, which leads to an insufficient ability to capture rich sentiment and act clues and hinders effective and accurate reasoning. To address these issues, we propose a Bi-directional Multi-hop Inference Model (BMIM) that leverages a feature selection network and a bi-directional multi-hop inference network to iteratively extract and integrate rich sentiment and act clues in a bi-directional manner. We also employ contrastive learning and dual learning to explicitly model the correlations of sentiment and act labels. Our experiments on two widely-used datasets show that BMIM outperforms state-of-the-art baselines by at least 2.6% on F1 score in DAR and 1.4% on F1 score in DSC. Additionally, Our proposed model not only improves the performance but also enhances the interpretability of the joint sentiment and act prediction task.
对话情感分类(DSC)和行动识别(DAR)的联合任务旨在同时预测对话中每个发言的情感标签和行动标签。然而,当前的方法仅从一个方向编码对话上下文,这限制了它们充分理解上下文的能力。此外,这些方法忽略了情感和行动标签之间的明确关系,这导致没有足够的能力捕捉丰富的情感和行动线索,并妨碍有效和准确的推理。为了解决这些问题,我们提出了一种双向分步推理模型(BMIM),它利用特征选择网络和双向分步推理网络,以迭代地提取和整合丰富的情感和行动线索,以双向方式。我们还采用对比学习和双重学习来 explicitly 建模情感和行动标签之间的关系。在我们两个广泛使用的数据集上的实验表明,BMIM在DAR中的F1得分上比最先进的基准高出至少2.6%,而在DSC中的F1得分上高出1.4%。此外,我们提出的模型不仅提高了性能,还增强了联合情感和行动预测任务的解释性。
https://arxiv.org/abs/2308.04424
Internet Memes remain a challenging form of user-generated content for automated sentiment classification. The availability of labelled memes is a barrier to developing sentiment classifiers of multimodal memes. To address the shortage of labelled memes, we propose to supplement the training of a multimodal meme classifier with unimodal (image-only and text-only) data. In this work, we present a novel variant of supervised intermediate training that uses relatively abundant sentiment-labelled unimodal data. Our results show a statistically significant performance improvement from the incorporation of unimodal text data. Furthermore, we show that the training set of labelled memes can be reduced by 40% without reducing the performance of the downstream model.
互联网表情包仍然是自动化情感分类中的一种挑战性用户生成内容形式。标记好的表情包的可用性是开发多模式情感分类器的障碍。为了解决标记好的表情包短缺问题,我们建议用单模式(仅图像和仅文本)数据补充多模式情感分类器的训练。在本研究中,我们提出了一种监督的中间训练新方法,使用相对较为丰富的情感标注单模式数据。我们的结果表明,引入单模式文本数据可以使性能表现出显著性的提升。此外,我们还表明,可以减少标记好的表情包训练集的40%,而不影响后续模型的性能。
https://arxiv.org/abs/2308.00528
Pre-trained models for Czech Natural Language Processing are often evaluated on purely linguistic tasks (POS tagging, parsing, NER) and relatively simple classification tasks such as sentiment classification or article classification from a single news source. As an alternative, we present CZEch~NEws~Classification~dataset (CZE-NEC), one of the largest Czech classification datasets, composed of news articles from various sources spanning over twenty years, which allows a more rigorous evaluation of such models. We define four classification tasks: news source, news category, inferred author's gender, and day of the week. To verify the task difficulty, we conducted a human evaluation, which revealed that human performance lags behind strong machine-learning baselines built upon pre-trained transformer models. Furthermore, we show that language-specific pre-trained encoder analysis outperforms selected commercially available large-scale generative language models.
捷克自然语言处理预训练模型通常只评估纯粹的语言任务(POS标签、解析、命名实体识别)和相对简单的分类任务,例如情感分类或从一个新闻源中的分类文章。作为一种替代方案,我们提供了捷克~NEws~Classification~dataset (CZE-NEC) - 捷克最大的分类数据集之一,由来自不同来源的新闻报道历时二十年组成,这使得更严格地评估这些模型变得更加容易。我们定义了四个分类任务:新闻来源、新闻类别、推断作者性别和星期几。为了验证任务的难度,我们进行了人类评估,这揭示了人类表现落后于基于预训练Transformer模型的强大机器学习基线。此外,我们还表明,语言特定的预训练编码分析优于选择的商业可用大规模生成语言模型。
https://arxiv.org/abs/2307.10666
Multilingual pretrained language models (MPLMs) have demonstrated substantial performance improvements in zero-shot cross-lingual transfer across various natural language understanding tasks by finetuning MPLMs on task-specific labelled data of a source language (e.g. English) and evaluating on a wide range of target languages. Recent studies show that prompt-based finetuning surpasses regular finetuning in few-shot scenarios. However, the exploration of prompt-based learning in multilingual tasks remains limited. In this study, we propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning. We conduct comprehensive experiments on diverse cross-lingual language understanding tasks (sentiment classification, paraphrase identification, and natural language inference) and empirically analyze the variation trends of prompt-based finetuning performance in cross-lingual transfer across different few-shot and full-data settings. Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding. Our findings indicate that prompt-based finetuning outperforms vanilla finetuning in full-data scenarios and exhibits greater advantages in few-shot scenarios, with different performance patterns dependent on task types. Additionally, we analyze underlying factors such as language similarity and pretraining data size that impact the cross-lingual performance of prompt-based finetuning. Overall, our work provides valuable insights into the cross-lingual prowess of prompt-based finetuning.
多语言预训练语言模型(MPLMs)在零样本跨语言传输方面取得了显著的性能改进,通过微调源语言(例如英语)的任务特定标签数据并评估多个目标语言,实现了。最近的研究表明,基于及时的微调在少量样本情况下比传统的微调更好。然而,在多语言任务中的基于及时的学习和学习研究仍然有限。在本研究中,我们提出了 ProFiT 管道来研究基于及时的微调的跨语言能力。我们进行了多个跨语言语言理解任务(情感分类、重写确认和自然语言推断)的全面实验,并经验证地分析了不同少量样本和全数据设置下基于及时的微调性能的变化趋势。我们的结果揭示了基于及时的微调在跨语言语言理解中的效率和灵活性。我们的发现表明,基于及时的微调在全数据情况下比传统微调表现更好,在少量样本情况下表现出更大的优势,不同的性能模式取决于任务类型。此外,我们分析了影响基于及时的微调的跨语言性能的底层因素,如语言相似性和预训练数据大小。总的来说,我们的工作提供了关于基于及时的微调的跨语言能力的有价值的见解。
https://arxiv.org/abs/2307.07880
Customer feedback is invaluable to companies as they refine their products. Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) which allows us to analyse specific aspects of the products in reviews. Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR). In this work, we propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks. We show that the performance improvement is likely attributed to the improved model CR ability. We also release a new dataset that focuses on CR in ALSC.
顾客反馈对于公司来说是非常宝贵的。通过 aspect level Sentiment Classification (ALSC) 自动化监测顾客反馈,我们可以分析评论中的特定方面。大型语言模型(LLMs)是许多最先进的 ALSC 解决方案的核心,但在需要同义替换(CR)的特定情况下表现较差。在这项工作中,我们提出了一个框架,通过优化高度指示的任务来提高 LLM 在包含 CR 评论中的表现。我们表明,表现改进很可能与改进模型 CR 能力有关。我们还发布了一个专注于 ALSC 中的 CR 的新数据集。
https://arxiv.org/abs/2307.05646
Dual-task dialog language understanding aims to tackle two correlative dialog language understanding tasks simultaneously via leveraging their inherent correlations. In this paper, we put forward a new framework, whose core is relational temporal graph reasoning.We propose a speaker-aware temporal graph (SATG) and a dual-task relational temporal graph (DRTG) to facilitate relational temporal modeling in dialog understanding and dual-task reasoning. Besides, different from previous works that only achieve implicit semantics-level interactions, we propose to model the explicit dependencies via integrating prediction-level interactions. To implement our framework, we first propose a novel model Dual-tAsk temporal Relational rEcurrent Reasoning network (DARER), which first generates the context-, speaker- and temporal-sensitive utterance representations through relational temporal modeling of SATG, then conducts recurrent dual-task relational temporal graph reasoning on DRTG, in which process the estimated label distributions act as key clues in prediction-level interactions. And the relational temporal modeling in DARER is achieved by relational convolutional networks (RGCNs). Then we further propose Relational Temporal Transformer (ReTeFormer), which achieves fine-grained relational temporal modeling via Relation- and Structure-aware Disentangled Multi-head Attention. Accordingly, we propose DARER with ReTeFormer (DARER2), which adopts two variants of ReTeFormer to achieve the relational temporal modeling of SATG and DTRG, respectively. The extensive experiments on different scenarios verify that our models outperform state-of-the-art models by a large margin. Remarkably, on the dialog sentiment classification task in the Mastodon dataset, DARER and DARER2 gain relative improvements of about 28% and 34% over the previous best model in terms of F1.
双任务对话语言理解的目标是通过利用其固有的相关性同时解决两个相关的对话语言理解任务。在本文中,我们提出了一个新的框架,其核心是关系时间图形推理。我们提出了一个具有意识时间的图形(SATG)和一个双任务关系时间图形(DRTG),以促进对话理解和双任务推理的关系时间建模。此外,与以前的工作仅实现间接语义级别互动不同,我们提议通过整合预测级别互动来模型明确的依赖关系。为了实现我们的框架,我们首先提出了一种新的模型,即双任务时间关系递归推理网络(DARER),该网络首先通过关系时间建模SATG生成语境、说话人和时间敏感性的言词表示,然后对DRTG进行反复的双任务关系时间图形推理,其中估计的标签分布作为预测级别互动的关键线索。而关系时间建模在DARER中通过关系卷积神经网络(RGCNs)实现。此外,我们进一步提出了关系时间Transformer(ReTeformer),该方法通过关系和结构aware Disentangled Multi-head Attention实现精细的关系时间建模。因此,我们提出了DARER和DARER2(DARER2),采用两个版本的ReTeformer来实现SATG和DTRG的关系时间建模。在不同场景下的广泛实验证实,我们的模型比现有最佳模型在F1得分上表现得更好。令人惊讶地,在 Mastodon 数据集上的对话情绪分类任务中,DARER和DARER2相对于以前的最佳模型获得了约28%和34%的相对改进。
https://arxiv.org/abs/2306.09114
Despite impressive advancements in multilingual corpora collection and model training, developing large-scale deployments of multilingual models still presents a significant challenge. This is particularly true for language tasks that are culture-dependent. One such example is the area of multilingual sentiment analysis, where affective markers can be subtle and deeply ensconced in culture. This work presents the most extensive open massively multilingual corpus of datasets for training sentiment models. The corpus consists of 79 manually selected datasets from over 350 datasets reported in the scientific literature based on strict quality criteria. The corpus covers 27 languages representing 6 language families. Datasets can be queried using several linguistic and functional features. In addition, we present a multi-faceted sentiment classification benchmark summarizing hundreds of experiments conducted on different base models, training objectives, dataset collections, and fine-tuning strategies.
尽管在多语言corpora收集和模型训练方面取得了令人印象深刻的进展,但开发大规模部署的多语言模型仍然面临一个重要的挑战。特别是当任务与文化背景相关时。一个例子是多语言情感分析领域,情感标记可以微妙而深深地融入文化。这项工作提出了训练情感模型的最广泛的开放大规模多语言数据集。该数据集由来自科学文献中超过350个数据集的79个手动选择的数据集组成,基于严格的质量标准。该数据集涵盖27个语言,代表了6个语言家族。数据集可以使用多种语言和功能特征进行查询。此外,我们提出了一个多功能情感分类基准,总结了对不同基模型、训练目标、数据集收集和微调策略进行数百次实验的结果。
https://arxiv.org/abs/2306.07902
The rapid deployment of artificial intelligence (AI) models demands a thorough investigation of biases and risks inherent in these models to understand their impact on individuals and society. This study extends the focus of bias evaluation in extant work by examining bias against social stigmas on a large scale. It focuses on 93 stigmatized groups in the United States, including a wide range of conditions related to disease, disability, drug use, mental illness, religion, sexuality, socioeconomic status, and other relevant factors. We investigate bias against these groups in English pre-trained Masked Language Models (MLMs) and their downstream sentiment classification tasks. To evaluate the presence of bias against 93 stigmatized conditions, we identify 29 non-stigmatized conditions to conduct a comparative analysis. Building upon a psychology scale of social rejection, the Social Distance Scale, we prompt six MLMs: RoBERTa-base, RoBERTa-large, XLNet-large, BERTweet-base, BERTweet-large, and DistilBERT. We use human annotations to analyze the predicted words from these models, with which we measure the extent of bias against stigmatized groups. When prompts include stigmatized conditions, the probability of MLMs predicting negative words is approximately 20 percent higher than when prompts have non-stigmatized conditions. In the sentiment classification tasks, when sentences include stigmatized conditions related to diseases, disability, education, and mental illness, they are more likely to be classified as negative. We also observe a strong correlation between bias in MLMs and their downstream sentiment classifiers (r =0.79). The evidence indicates that MLMs and their downstream sentiment classification tasks exhibit biases against socially stigmatized groups.
人工智能模型的迅速部署要求对模型中存在的偏见和风险进行深入调查,以理解它们对个人和社会的影响。本研究扩大了现有工作中的偏见评估焦点,通过大规模检查针对社交污名的偏见。研究焦点集中在美国93个污名化群体,包括与疾病、残疾、药物使用、心理健康、宗教、性活动、社会经济地位和其他相关因素的各种情况。我们研究英语训练有素的掩藏语言模型(MLMs)及其下游情感分类任务中的偏见。为了评估针对93个污名化条件的偏见,我们确定了29个非污名化条件,进行比较分析。基于心理学的社交拒绝尺度,建立了社会距离尺度,我们促使了六个MLMs:RoBERTa-base、RoBERTa-large、XLNet-large、BERTweet-base、BERTweet-large和DistilBERT。我们使用人类注释来分析这些模型预测的词语,通过它们来衡量针对污名化群体的偏见程度。当prompts包括污名化条件时,MLMs预测负面单词的概率大约是当prompts包含非污名化条件时的两倍。在情感分类任务中,当句子包含与疾病、残疾、教育以及心理健康相关的污名化条件时,它们更可能被分类为负面。我们还观察到MLMs及其下游情感分类任务之间的强相关性(r =0.79)。证据表明,MLMs及其下游情感分类任务表现出针对社交污名化群体的偏见。
https://arxiv.org/abs/2306.05550
Prompting large language models has gained immense popularity in recent years due to the advantage of producing good results even without the need for labelled data. However, this requires prompt tuning to get optimal prompts that lead to better model performances. In this paper, we explore the use of soft-prompt tuning on sentiment classification task to quantify the biases of large language models (LLMs) such as Open Pre-trained Transformers (OPT) and Galactica language model. Since these models are trained on real-world data that could be prone to bias toward certain groups of populations, it is important to identify these underlying issues. Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection that can be caused by manually designed prompts. We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns. Since LLMs have been used in the industry in various applications, it is crucial to identify the biases before deploying these models in practice. We open-source our pipeline and encourage industry researchers to adapt our work to their use cases.
过去几年中,引导大型语言模型变得非常流行,因为即使在不需要标记数据的情况下也能产生良好的结果,这样做的优点在于能够获取最佳的引导,从而使模型表现更好。在本文中,我们探讨了在情感分类任务中使用软引导优化,量化大型语言模型(LLM)如Open Pre-trained Transformers(OPT)和银河 language model 等设备的偏见,因为这些模型是在现实世界数据上进行训练的,这些数据可能具有对某些群体的主观偏见。因此,确定这些基础问题非常重要。使用软引导来评估偏见给了我们避免由手动设计引导引起的人类偏见的额外优势。我们使用群体公平(偏见)来检查模型在不同敏感属性上的偏见,并发现了有趣的偏见模式。由于LLM已经被广泛用于各种应用领域,所以在实际使用这些模型之前,确定这些偏见是至关重要的。我们将开源我们的流程,并鼓励工业研究人员将我们的工作适应他们的应用场景。
https://arxiv.org/abs/2306.04735
This paper describes our system designed for SemEval-2023 Task 12: Sentiment analysis for African languages. The challenge faced by this task is the scarcity of labeled data and linguistic resources in low-resource settings. To alleviate these, we propose a generalized multilingual system SACL-XLMR for sentiment analysis on low-resource languages. Specifically, we design a lexicon-based multilingual BERT to facilitate language adaptation and sentiment-aware representation learning. Besides, we apply a supervised adversarial contrastive learning technique to learn sentiment-spread structured representations and enhance model generalization. Our system achieved competitive results, largely outperforming baselines on both multilingual and zero-shot sentiment classification subtasks. Notably, the system obtained the 1st rank on the zero-shot classification subtask in the official ranking. Extensive experiments demonstrate the effectiveness of our system.
本论文介绍了我们为SemEval-2023任务12:对非洲语言进行Sentiment分析而设计的系统。该任务面临的挑战是在资源匮乏的环境下,标注数据和语言资源匮乏。为了解决这些问题,我们提出了一种基于词汇表的多语言系统SACL-XLMR,用于对资源匮乏语言的Sentiment分析。具体来说,我们设计了基于词汇表的多语言BERT,以促进语言适应和Sentimentaware representation learning。此外,我们应用了一种监督对抗性Contrastive Learning技术,以学习Sentiment spread structured representations并增强模型的泛化能力。我们的系统取得了竞争的结果,在多语言和零样本Sentiment分类子任务中 largely outperforming基准模型。值得注意的是,在官方排名中,系统在零样本分类子任务中获得了第一名。广泛的实验证明了我们的系统的 effectiveness。
https://arxiv.org/abs/2306.01093
Sentiment classification is one the best use case of classical natural language processing (NLP) where we can witness its power in various daily life domains such as banking, business and marketing industry. We already know how classical AI and machine learning can change and improve technology. Quantum natural language processing (QNLP) is a young and gradually emerging technology which has the potential to provide quantum advantage for NLP tasks. In this paper we show the first application of QNLP for sentiment analysis and achieve perfect test set accuracy for three different kinds of simulations and a decent accuracy for experiments ran on a noisy quantum device. We utilize the lambeq QNLP toolkit and $t|ket>$ by Cambridge Quantum (Quantinuum) to bring out the results.
情感分类是经典自然语言处理(NLP)的最佳应用场景之一,可以在各种日常领域,如银行、商业和市场营销业等中观察到其力量。我们已经知道经典AI和机器学习如何可以改变和改进技术。量子自然语言处理(QNLP)是一个年轻的、逐步 emerging 技术,其潜力为NLP任务提供量子优势。在本文中,我们展示了QNLP的第一个应用领域,为情感分析提供测试集准确性,并在三种不同模拟情况下实现了完美准确性,而在噪声量子设备上的实验则取得了不错的准确度。我们使用了 Cambridge Quantum 的 Lambeq QNLP toolkit 和 $t|ket>$ 来呈现结果。
https://arxiv.org/abs/2305.19383