According to the literature, Product reviews are an important source of information for customers to support their buying decision. Product reviews improve customer trust and loyalty. Reviews help customers in understanding what other customers think about a particular product and helps in driving purchase decisions. Therefore, for an e-commerce platform it is important to understand the sentiments in customer reviews to understand their products and services, and it also allows them to potentially create positive consumer interaction as well as long lasting relationships. Reviews also provide innovative ways to market the products for an ecommerce company. One such approach is Nudge Marketing. Nudge marketing is a subtle way for an ecommerce company to help their customers make better decisions without hesitation.
根据文献,产品评论是顾客支持购买决策的重要信息来源。产品评论提高了顾客的信任和忠诚度。通过评论,顾客可以了解其他顾客对某个产品的看法,从而帮助他们做出购买决策。因此,对于一个电子商务平台来说,了解顾客对产品评论的态度以了解他们的产品和服务非常重要,这也有助于他们创造积极的消费者互动以及长期的关系。产品评论也为电子商务公司提供了创新的营销方式。一种 such approach is Nudge Marketing。Nudge marketing is a subtle way for an ecommerce company to help their customers make better decisions without hesitation.
https://arxiv.org/abs/2311.10782
Recent advancements in natural language processing have led to the proliferation of large language models (LLMs). These models have been shown to yield good performance, using in-context learning, even on unseen tasks and languages. Additionally, they have been widely adopted as language-model-as-a-service commercial APIs like GPT-4 API. However, their performance on African languages is largely unknown. We present an analysis of three popular large language models (mT0, LLaMa 2, and GPT-4) on five tasks (news topic classification, sentiment classification, machine translation, question answering, and named entity recognition) across 30 African languages, spanning different language families and geographical regions. Our results suggest that all LLMs produce below-par performance on African languages, and there is a large gap in performance compared to high-resource languages like English most tasks. We find that GPT-4 has an average or impressive performance on classification tasks but very poor results on generative tasks like machine translation. Surprisingly, we find that mT0 had the best overall on cross-lingual QA, better than the state-of-the-art supervised model (i.e. fine-tuned mT5) and GPT-4 on African languages. Overall, LLaMa 2 records the worst performance due to its limited multilingual capabilities and English-centric pre-training corpus. In general, our findings present a call-to-action to ensure African languages are well represented in large language models, given their growing popularity.
近年来自然语言处理领域的进步导致了大型语言模型的(LLMs)的繁荣。这些模型已经在可见的任务和语言上表现良好,即使是在未见过的任务和语言上。此外,它们已经被广泛应用于诸如GPT-4 API这样的语言模型服务中。然而,它们在非洲语言上的表现仍然是未知的。我们对三种流行的LLM(mT0、LLaMa 2和GPT-4)在30个非洲语言上的五个任务(新闻主题分类、情感分类、机器翻译、问答和命名实体识别)进行了分析。我们的结果表明,所有LLM在非洲语言上的表现都低于预期,与英语等高资源语言相比,差距很大。我们发现,GPT-4在分类任务上具有平均或出色的性能,但在生成任务(如机器翻译)上表现非常差。令人惊讶的是,我们发现mT0在跨语言QA上表现最佳,优于当前最先进的监督模型(即微调的mT5)和GPT-4在非洲语言上的表现。总的来说,LLaMa 2由于其有限的多语言能力以及英语中心化的预训练语料库而记录了最差的表现。总的来说,我们的研究结果发出一个呼吁,确保非洲语言在大型语言模型中得到充分代表,鉴于它们日益增长的重要性。
https://arxiv.org/abs/2311.07978
The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.
对于在未见过的领域中的离散化(OD)测试样本,已经有很多关于英语中微调语言模型性能的研究,但对于多语言模型来说,这个领域仍然是一个未探索的领域。因此,我们研究在零散跨语言转换设置中,将通用测试数据集(OOD)作为目标的情况下的泛化,分析训练数据和测试数据之间的语言和领域转移对性能的影响。我们进一步评估了反事实增强数据(CAD)在跨语言设置下的OOD泛化效果,因为CAD在单语英语环境中已经被证明是有益的。最后,我们提出了两种新的OOD泛化方法,通过利用最近的大语言模型(LLM)的力量,避免了与CAD相关的昂贵注释过程。我们在英语IMDb电影评论上训练了3个多语言模型:LaBSE、mBERT和XLM-R,并用13种语言评估了在OOD测试集上的性能。结果与单语英语环境中的OOD性能下降相呼应。此外,(i)来自原始高资源语言的反事实数据在低资源语言中改善了OOD泛化,和(ii)我们新提出的具有成本效益的方法在Amazon和Restaurant评论上比CAD具有相似或更好的准确性,达到+3.1%的提高。
https://arxiv.org/abs/2311.06549
Research on data generation and augmentation has been focused majorly on enhancing generation models, leaving a notable gap in the exploration and refinement of methods for evaluating synthetic data. There are several text similarity metrics within the context of generated data filtering which can impact the performance of specific Natural Language Understanding (NLU) tasks, specifically focusing on intent and sentiment classification. In this study, we propose RankAug, a text-ranking approach that detects and filters out the top augmented texts in terms of being most similar in meaning with lexical and syntactical diversity. Through experiments conducted on multiple datasets, we demonstrate that the judicious selection of filtering techniques can yield a substantial improvement of up to 35% in classification accuracy for under-represented classes.
数据生成和增强的研究主要集中在增强生成模型,但在评估合成数据的探索和优化方法方面存在显著的空白。在生成数据过滤的文本相似度度量中,有几个可以影响特定自然语言理解(NLU)任务的性能,特别是关注意图和情感分类的文本相似度度量。在这项研究中,我们提出了RankAug,一种文本排序方法,通过检测和过滤出在语义上与原始文本最相似的增强文本。通过在多个数据集上进行实验,我们证明了谨慎选择过滤技术可以极大地提高代表不足类别的分类精度,最高可提高35%。
https://arxiv.org/abs/2311.04535
This paper provides different approaches for a binary sentiment classification on a small training dataset. LLMs that provided state-of-the-art results in sentiment analysis and similar domains are being used, such as BERT, RoBERTa and XLNet.
本论文为在小型训练数据集上进行二分类情感分类提供不同的方法。用于此任务的最佳成果的LLM包括BERT,RoBERTa和XLNet。
https://arxiv.org/abs/2311.04139
Aspect-based sentiment classification (ASC) aims to judge the sentiment polarity conveyed by the given aspect term in a sentence. The sentiment polarity is not only determined by the local context but also related to the words far away from the given aspect term. Most recent efforts related to the attention-based models can not sufficiently distinguish which words they should pay more attention to in some cases. Meanwhile, graph-based models are coming into ASC to encode syntactic dependency tree information. But these models do not fully leverage syntactic dependency trees as they neglect to incorporate dependency relation tag information into representation learning effectively. In this paper, we address these problems by effectively modeling the local and global features. Firstly, we design a local encoder containing: a Gaussian mask layer and a covariance self-attention layer. The Gaussian mask layer tends to adjust the receptive field around aspect terms adaptively to deemphasize the effects of unrelated words and pay more attention to local information. The covariance self-attention layer can distinguish the attention weights of different words more obviously. Furthermore, we propose a dual-level graph attention network as a global encoder by fully employing dependency tag information to capture long-distance information effectively. Our model achieves state-of-the-art performance on both SemEval 2014 and Twitter datasets.
面向 aspect 的情感分类(ASC)旨在判断给定 aspect 词在句子中表达的情感极性。情感极性不仅取决于局部上下文,还与距离给定 aspect 词较远的单词有关。最相关的关注基于注意力的模型在某些情况下不能充分区分它们应该更关注哪些单词。同时,基于图的模型正在进入 ASC,以编码语义依赖树信息。但是,这些模型没有充分利用语义依赖树,因为它们忽略了将关系标签信息有效地融入表示学习。在本文中,我们通过有效地建模局部和全局特征来解决这些问题。首先,我们设计了一个局部编码器,它包含一个高斯掩码层和一个协方差自注意层。高斯掩码层倾向于自适应地调整与 aspect 词相关的感受野,以削弱无关单词的影响,更加关注局部信息。协方差自注意层可以更明显地区分不同单词的注意力权重。此外,我们提出了一个双层图注意力网络作为全局编码器,通过充分利用关系标签信息来捕捉长距离信息。我们的模型在 SemEval 2014 和 Twitter 数据集上实现了最先进的性能。
https://arxiv.org/abs/2311.01030
We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
我们提出了一个自注意力在Transformer神经网络架构中的因果解释。我们将自注意力解释为估计给定输入序列符号(词)的结构方程模型的机制。结构方程模型可以进一步解释为在给定输入序列的具体上下文中,输入符号之间的因果结构。重要的是,在存在潜在混淆因素的情况下,这种解释仍然有效。 为了实现这一解释,我们在Transformer的注意层中计算输入符号对应表示的 partial correlation。这使得使用现有的基于约束的算法学习输入序列的因果结构成为可能。在这个意义上,现有的预训练Transformer可以用于零散偶因因果发现。 我们通过为两个任务(情感分类和推荐)的Transformer提供因果解释来证明这种方法:情感分类(NLP)和推荐。
https://arxiv.org/abs/2310.20307
Sentiment analysis is a fundamental and valuable task in NLP. However, due to limitations in data and technological availability, research into sentiment analysis of African languages has been fragmented and lacking. With the recent release of the AfriSenti-SemEval Shared Task 12, hosted as a part of The 17th International Workshop on Semantic Evaluation, an annotated sentiment analysis of 14 African languages was made available. We benchmarked and compared current state-of-art transformer models across 12 languages and compared the performance of training one-model-per-language versus single-model-all-languages. We also evaluated the performance of standard multilingual models and their ability to learn and transfer cross-lingual representation from non-African to African languages. Our results show that despite work in low resource modeling, more data still produces better models on a per-language basis. Models explicitly developed for African languages outperform other models on all tasks. Additionally, no one-model-fits-all solution exists for a per-language evaluation of the models evaluated. Moreover, for some languages with a smaller sample size, a larger multilingual model may perform better than a dedicated per-language model for sentiment classification.
情感分析是自然语言处理(NLP)中的一个基本且非常有价值的任务。然而,由于数据和技术能力的限制,对非洲语言情感分析的研究已经支离破碎,缺乏。随着最近发布的《AfriSenti-SemEval》共享任务12作为第17届国际语义评估会议的一部分,提供了对14种非洲语言的注释情感分析。我们横跨12种语言,对最先进的转换器模型进行了基准测试,并将训练一种模型每种语言与单模型所有语言的性能进行了比较。我们还评估了标准多语言模型的性能以及它们从非非洲语言到非洲语言学习和转移跨语言表示的能力。我们的结果表明,尽管在低资源建模方面做出了努力,但数据仍然在每种语言的基础上产生了更好的模型。专门为非洲语言设计的模型在所有任务上都优于其他模型。此外,对于某些样本量较小的语言,更大的多语言模型可能在情感分类上表现更好。
https://arxiv.org/abs/2310.14120
Distribution shifts between train and test datasets obscure our ability to understand the generalization capacity of neural network models. This topic is especially relevant given the success of pre-trained foundation models as starting points for transfer learning (TL) models across tasks and contexts. We present a case study for TL on a pre-trained GPT-2 model onto the Sentiment140 dataset for sentiment classification. We show that Sentiment140's test dataset $M$ is not sampled from the same distribution as the training dataset $P$, and hence training on $P$ and measuring performance on $M$ does not actually account for the model's generalization on sentiment classification.
训练数据和测试数据之间的分布转移会破坏我们理解神经网络模型泛化能力的能力。考虑到预训练基础模型作为迁移学习(TL)模型的起点在各种任务和上下文中的成功,这个问题尤为重要。我们将在情感分类任务上对一个预训练的GPT-2模型进行TL的研究,以证明情感140数据的测试集$M$并不来自与训练集$P$相同的分布,因此仅在$P$上训练模型并测量在$M$上的性能并不能真正反映出模型在情感分类方面的泛化能力。
https://arxiv.org/abs/2310.13836
Temporal data distribution shift is prevalent in the financial text. How can a financial sentiment analysis system be trained in a volatile market environment that can accurately infer sentiment and be robust to temporal data distribution shifts? In this paper, we conduct an empirical study on the financial sentiment analysis system under temporal data distribution shifts using a real-world financial social media dataset that spans three years. We find that the fine-tuned models suffer from general performance degradation in the presence of temporal distribution shifts. Furthermore, motivated by the unique temporal nature of the financial text, we propose a novel method that combines out-of-distribution detection with time series modeling for temporal financial sentiment analysis. Experimental results show that the proposed method enhances the model's capability to adapt to evolving temporal shifts in a volatile financial market.
temporal数据分布变化在财务文本中普遍存在。在具有不稳定性市场的动荡环境中,如何训练一个准确推断情感并对于时间序列数据分布变化具有鲁棒性的金融情感分析系统呢?在本文中,我们对一个跨越三年的真实世界金融社交媒体数据集进行了实证研究,研究了在时间序列数据分布变化下对金融情感分析系统的性能影响。我们发现,在存在时间序列分布变化的情况下,微调的模型性能普遍下降。此外,为了满足金融文本的独特时间性质,我们提出了一种结合离散检测和时间序列建模的新颖方法来进行金融情感分析。实验结果表明,与传统方法相比,所提出的方法可以增强模型在动荡金融市场中适应不断变化的时间序列变化的能力。
https://arxiv.org/abs/2310.12620
In the financial domain, conducting entity-level sentiment analysis is crucial for accurately assessing the sentiment directed toward a specific financial entity. To our knowledge, no publicly available dataset currently exists for this purpose. In this work, we introduce an entity-level sentiment classification dataset, called \textbf{FinEntity}, that annotates financial entity spans and their sentiment (positive, neutral, and negative) in financial news. We document the dataset construction process in the paper. Additionally, we benchmark several pre-trained models (BERT, FinBERT, etc.) and ChatGPT on entity-level sentiment classification. In a case study, we demonstrate the practical utility of using FinEntity in monitoring cryptocurrency markets. The data and code of FinEntity is available at \url{this https URL}
在金融领域,对实体层面的情感分析对于准确评估针对特定金融实体的情感至关重要。据我们所知,目前还没有公开可用的数据集为此目的而存在。在这篇论文中,我们引入了一个名为 \textbf{FinEntity} 的实体层面的情感分类数据集,该数据集对金融新闻中的实体跨度及其情感(积极、中性和消极)进行标注。我们在论文中记录了数据集构建的过程。此外,我们还对多个预训练模型(如 BERT、FinBERT 等)进行了实体层面的情感分类的基准测试。在一个案例研究中,我们展示了使用 FinEntity 对加密货币市场的实际应用价值。FinEntity 的数据和代码可通过以下链接获取:https://this URL。
https://arxiv.org/abs/2310.12406
Weak supervision has emerged as a promising approach for rapid and large-scale dataset creation in response to the increasing demand for accelerated NLP development. By leveraging labeling functions, weak supervision allows practitioners to generate datasets quickly by creating learned label models that produce soft-labeled datasets. This paper aims to show how such an approach can be utilized to build an Indonesian NLP dataset from conservation news text. We construct two types of datasets: multi-class classification and sentiment classification. We then provide baseline experiments using various pretrained language models. These baseline results demonstrate test performances of 59.79% accuracy and 55.72% F1-score for sentiment classification, 66.87% F1-score-macro, 71.5% F1-score-micro, and 83.67% ROC-AUC for multi-class classification. Additionally, we release the datasets and labeling functions used in this work for further research and exploration.
弱监督作为一种快速大规模数据集创建的方法,应对加速自然语言处理(NLP)发展的日益增加的需求,已经成为了备受关注的领域。通过利用标注功能,弱监督使得实践者能够通过创建学过的标签模型来快速生成软标签数据集。本文旨在展示如何利用这种方法来构建印度尼西亚的自然语言处理数据集。我们构建了两种类型的数据集:多分类分类和情感分类。接着,我们使用各种预训练语言模型进行了 baseline 实验。这些基线结果表明,情感分类的准确率为 59.79%,F1 分数为 55.72%,multi-class 分类的 F1 分数为 71.5%,micro 分数为 83.67%,macro F1 分数为 66.87%。此外,我们还发布了本研究中使用的数据集和标注函数,供进一步研究和探索。
https://arxiv.org/abs/2310.11258
With the recent advances in social media, the use of NLP techniques in social media data analysis has become an emerging research direction. Business organizations can particularly benefit from such an analysis of social media discourse, providing an external perspective on consumer behavior. Some of the NLP applications such as intent detection, sentiment classification, text summarization can help FinTech organizations to utilize the social media language data to find useful external insights and can be further utilized for downstream NLP tasks. Particularly, a summary which highlights the intents and sentiments of the users can be very useful for these organizations to get an external perspective. This external perspective can help organizations to better manage their products, offers, promotional campaigns, etc. However, certain challenges, such as a lack of labeled domain-specific datasets impede further exploration of these tasks in the FinTech domain. To overcome these challenges, we design an unsupervised phrase-based summary generation from social media data, using 'Action-Object' pairs (intent phrases). We evaluated the proposed method with other key-phrase based summary generation methods in the direction of contextual information of various Reddit discussion threads, available in the different summaries. We introduce certain "Context Metrics" such as the number of Unique words, Action-Object pairs, and Noun chunks to evaluate the contextual information retrieved from the source text in these phrase-based summaries. We demonstrate that our methods significantly outperform the baseline on these metrics, thus providing a qualitative and quantitative measure of their efficacy. Proposed framework has been leveraged as a web utility portal hosted within Amex.
随着社交媒体最近的发展,将自然语言处理(NLP)技术应用于社交媒体数据分析已成为一个新兴的研究方向。企业组织尤其可以从对社交媒体语料的分析中受益,这为从消费者行为的外部视角提供了来源。一些自然语言处理应用,如意图检测、情感分类和文本摘要,可以帮助金融科技组织利用社交媒体语言数据找到有用的外部见解,并进一步用于下游的自然语言处理任务。特别是,总结用户的意图和情感可以帮助这些组织获得外部视角。这个外部视角可以帮助组织更好地管理他们的产品、促销活动等。然而,某些挑战,如缺乏带有标签的领域特定数据集,阻碍了这些任务在金融科技领域进一步探索。为了克服这些挑战,我们设计了一个基于社交媒体数据的无监督短语基总结生成方法,使用“动作-对象”对(意图短语)。我们评估了所提出的与其他基于短语的摘要生成方法在语义信息方面的性能。我们引入了一些“上下文度量”,如唯一单词的数量、动作-对象对和名词片段,以评估从源文本中检索到的上下文信息。我们证明,我们的方法在这些指标上显著优于基线,从而提供了它们的有效性的定量和定性衡量。所提出的框架已作为Amex托管的网页实用程序(Web utility portal)的一部分得到利用。
https://arxiv.org/abs/2310.10294
Recently, Target-oriented Multimodal Sentiment Classification (TMSC) has gained significant attention among scholars. However, current multimodal models have reached a performance bottleneck. To investigate the causes of this problem, we perform extensive empirical evaluation and in-depth analysis of the datasets to answer the following questions: Q1: Are the modalities equally important for TMSC? Q2: Which multimodal fusion modules are more effective? Q3: Do existing datasets adequately support the research? Our experiments and analyses reveal that the current TMSC systems primarily rely on the textual modality, as most of targets' sentiments can be determined solely by text. Consequently, we point out several directions to work on for the TMSC task in terms of model design and dataset construction. The code and data can be found in this https URL.
近年来,面向目标的多模态情感分类(TMSC)在学者们中引起了广泛关注。然而,目前的多模态模型已经达到了性能瓶颈。为了研究这个问题,我们对数据集进行了广泛的实证评估和深入分析,以回答以下问题:Q1:模态是否同等重要?Q2:哪些多模态融合模块效果更佳?Q3:现有的数据集是否足够支持研究?我们的实验和分析结果表明,当前的TMSC系统主要依赖于文本模态,因为大多数目标的情感都可以通过文本确定。因此,我们指出了在TMSC任务模型设计和数据集构建方面需要关注的一些方向。代码和数据可在此https URL找到。
https://arxiv.org/abs/2310.09596
Sentiment Analysis (SA) is an indispensable task for many real-world applications. Compared to limited resourced languages (i.e., Arabic, Bengali), most of the research on SA are conducted for high resourced languages (i.e., English, Chinese). Moreover, the reasons behind any prediction of the Arabic sentiment analysis methods exploiting advanced artificial intelligence (AI)-based approaches are like black-box - quite difficult to understand. This paper proposes an explainable sentiment classification framework for the Arabic language by introducing a noise layer on Bi-Directional Long Short-Term Memory (BiLSTM) and Convolutional Neural Networks (CNN)-BiLSTM models that overcome over-fitting problem. The proposed framework can explain specific predictions by training a local surrogate explainable model to understand why a particular sentiment (positive or negative) is being predicted. We carried out experiments on public benchmark Arabic SA datasets. The results concluded that adding noise layers improves the performance in sentiment analysis for the Arabic language by reducing overfitting and our method outperformed some known state-of-the-art methods. In addition, the introduced explainability with noise layer could make the model more transparent and accountable and hence help adopting AI-enabled system in practice.
Sentiment Analysis (SA) 是许多现实世界应用不可或缺的任务。与资源有限的语言(如阿拉伯语、孟加拉语)相比,大部分关于 SA 的研究都是针对资源充足的语言(如英语、中文)进行的。此外,利用基于高级人工智能(AI)的方法进行阿拉伯语情感分析的任何预测原因都像是一个黑盒子 - 相当难以理解。本文提出了对阿拉伯语语言进行可解释情感分类框架的建议,通过在 Bi-Directional Long Short-Term Memory (BiLSTM) 和卷积神经网络(CNN)-BiLSTM模型中添加噪声层来克服过拟合问题。该框架可以通过训练本地代理可解释模型来解释具体的预测,以理解为什么特定的情感(积极或消极)正在被预测。我们在公共基准阿拉伯语 SA 数据集上进行了实验。结果得出结论,添加噪声层可以提高阿拉伯语语言的情感分析性能,减少过拟合,我们的方法和一些已知的先进方法相比表现更好。此外,引入噪声层可能会导致模型更透明和可问责,因此有助于在实践中采用具备 AI 功能的系统。
https://arxiv.org/abs/2309.13731
This paper presents BELT, a novel model and learning framework for the pivotal topic of brain-to-language translation research. The translation from noninvasive brain signals into readable natural language has the potential to promote the application scenario as well as the development of brain-computer interfaces (BCI) as a whole. The critical problem in brain signal decoding or brain-to-language translation is the acquisition of semantically appropriate and discriminative EEG representation from a dataset of limited scale and quality. The proposed BELT method is a generic and efficient framework that bootstraps EEG representation learning using off-the-shelf large-scale pretrained language models (LMs). With a large LM's capacity for understanding semantic information and zero-shot generalization, BELT utilizes large LMs trained on Internet-scale datasets to bring significant improvements to the understanding of EEG signals. In particular, the BELT model is composed of a deep conformer encoder and a vector quantization encoder. Semantical EEG representation is achieved by a contrastive learning step that provides natural language supervision. We achieve state-of-the-art results on two featuring brain decoding tasks including the brain-to-language translation and zero-shot sentiment classification. Specifically, our model surpasses the baseline model on both tasks by 5.45% and over 10% and archives a 42.31% BLEU-1 score and 67.32% precision on the main evaluation metrics for translation and zero-shot sentiment classification respectively.
本论文介绍了BEFL,一个关键主题的大脑到语言翻译研究的创新模型和学习框架。将非侵入性脑信号翻译为可读的自然语言有潜力促进应用场景和提高整个脑-计算机接口(BCI)的发展。脑信号解码或脑到语言翻译的关键问题是从规模有限且质量有限的数据集中提取语义合适的、具有区分性的EEG表示。 proposed BEFL方法是一种通用且高效的框架,使用现有的大型预训练语言模型(LMs)BootstrapEEG表示学习。大型LM的能力能够理解语义信息和零样本泛化,BEFL利用在Internet上训练的大型LMs来显著改善对EEG信号的理解。特别是,BEFL模型由一个深度 conformer编码器和一个向量点编码器组成。语义EEG表示是通过比较学习步骤提供自然语言监督实现的。我们在两个关键的大脑解码任务中实现了最先进的结果,包括脑到语言翻译和零样本情感分类。具体来说,我们的模型在两个任务上超过了基准模型,分别超过5.45%和10%。在翻译和零样本情感分类的主要评估指标上,我们实现了42.31%的BLEU-1得分和67.32%的精度。
https://arxiv.org/abs/2309.12056
With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
随着大数据和计算设备的迅速发展,基于实时信息获取的低延迟自动交易平台已成为股票市场的主要组成部分,因此量化交易话题已经引起了广泛的关注和重视。对于不太高效的交易市场,人类情感和期望总是占据市场趋势和交易决策的主导地位。因此,本文从情感理论入手,以东方财富为例,从对应股票 bar 的用户评论标题数据爬取用户评论数据并进行数据清洗。随后,自然语言处理模型 BERT 被构建出来,并使用现有的标注数据集进行微调。实验结果显示,微调模型的性能改进程度与原始模型和基准模型不同。随后,基于上述模型,用户评论数据爬行被标记为情感极性,并将所获得的标签信息与 Alpha191 模型结合,用于回归,并获得了显著的回归结果。随后,回归模型被用于预测未来五天的平均价格波动,并将其用作自动交易的信号指导。实验结果显示,将情感因素融入量化交易会增加回报率,相比基准期间提高了 73.8%,而与原始 Alpha191 模型相比则提高了 32.41%。最后,本文讨论了将情感因素融入量化交易的优点和缺点,并提供了未来研究的可能方向。
https://arxiv.org/abs/2309.11979
The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm that the attention-based information can be effectively used for data augmentation in the NLP domain.
混合方法在计算机视觉中已经证明是一种强大的数据增强技术,有许多后续方法,都能够以引导的方式进行图像混合。其中一个有趣的研究方向是将混合方法的基础 idea 转移到其他领域,例如自然语言处理(NLP)。尽管已经存在几种方法将混合方法应用于文本数据,但仍有改进的空间。在本研究中,我们介绍了注意力混合方法,这是一种基于注意力的信息混合方法。尽管论文主要关注BERT注意力机制,但 proposed 方法可以适用于任何基于注意力的模型。注意力混合在3个标准情感分类数据集上进行评估,在所有三个情况下都比使用混合机制的两个基准方法以及传统的BERT方法表现更好。结果确认了基于注意力的信息在NLP领域中可以 effectively 用于数据增强。
https://arxiv.org/abs/2309.11104
In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.
上下文学习(ICL)使用大型语言模型进行多项标签任务的挑战是由于上下文窗口有限,这使得很难在提示中装入足够的示例。在本文中,我们使用预训练的密集检索模型绕过了这种限制,为模型只提供了每个推理调用的完整标签空间的 partial 视图。与最近的开源LLM(OPT、LLaMA)进行测试,我们为三种常见的意图分类数据集在少量样本设置中创造了新的顶尖性能,而无需微调。我们还在某些情况下超越了微调性能。我们对上下文中的示例数量和不同模型规模的性能进行分析,表明大型模型是必要的,以便有效地、一致性地利用更大的上下文长度来进行ICL。通过运行几个析因函数,我们分析了模型使用的内容:a) 上下文中的示例与当前输入的相似性,b) 类名称语义内容,以及c) 示例和标签的正确对应关系。我们证明了,这三个方面都是必不可少的,取决于领域,与某些最近的工作相反。
https://arxiv.org/abs/2309.10954
As impact of COVID-19 pandemic winds down, both individuals and society gradually return to pre-pandemic activities. This study aims to explore how people's emotions have changed from the pre-pandemic during the pandemic to post-emergency period and whether it has returned to pre-pandemic level. We collected Reddit data in 2019 (pre-pandemic), 2020 (peak pandemic), 2021, and 2022 (late stages of pandemic, transitioning period to post-emergency period) from subreddits in 128 universities/colleges in the U.S., and a set of school-level characteristics. We predicted two sets of sentiments from a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) and graph attention network (GAT) that leverages both rich semantic and relational information among posted messages and then applied a logistic stacking method to obtain the final sentiment classification. After obtaining sentiment label for each message, we used a generalized linear mixed-effects model to estimate temporal trend in sentiment from 2019 to 2022 and how school-level factors may affect sentiment. Compared to the year 2019, the odds of negative sentiment in years 2020, 2021, and 2022 are 24%, 4.3%, and 10.3% higher, respectively, which are all statistically significant(adjusted $p$<0.05). Our study findings suggest a partial recovery in the sentiment composition in the post-pandemic-emergency era. The results align with common expectations and provide a detailed quantification of how sentiments have evolved from 2019 to 2022.
随着新冠病毒大流行的影响逐渐减弱,个人和社会逐渐回到了 pre-pandemic 的活动。本研究旨在探索人们的情绪从大流行期间 pre-pandemic 到 post-Emergency 时期的变化,以及是否已经回到了 pre-pandemic 的水平。我们从在美国128所大学/学院的 Reddit 论坛中收集了 2019 年( pre-pandemic)、2020 年(大流行的巅峰)、2021 年和 2022 年(大流行后期、过渡时期到 post-Emergency 时期)的 180 多条帖子的数据,并收集了一些学校的特点。我们使用预先训练的鲁棒优化的BERT预训练方法(RoBERTa)和图形注意力网络(GAT)来预测两个情绪类别,这些情绪类别利用发布帖子中丰富的语义和关系信息,然后应用 Logistic 堆叠方法来得到最终的情绪分类。在获得每个消息的情绪标签后,我们使用通用线性混合效应模型来估计从 2019 年到 2022 年的情绪趋势,以及学校因素可能如何影响情绪。与 2019 年相比,2020、2021 和 2022 年负面情绪的事件发生率分别提高了 24%、4.3 % 和 10.3%,它们都具有统计学意义(调整 $p$<0.05)。我们的研究结果表明, post-pandemic-Emergency 时期的情绪组成 partial 恢复了。结果与普遍期望一致,提供了详细的量化情绪从 2019 年到 2022 年的变化。
https://arxiv.org/abs/2309.08845