Distributed representations provide a vector space that captures meaningful relationships between data instances. The distributed nature of these representations, however, entangles together multiple attributes or concepts of data instances (e.g., the topic or sentiment of a text, characteristics of the author (age, gender, etc), etc). Recent work has proposed the task of concept erasure, in which rather than making a concept predictable, the goal is to remove an attribute from distributed representations while retaining other information from the original representation space as much as possible. In this paper, we propose a new distance metric learning-based objective, the Kernelized Rate-Distortion Maximizer (KRaM), for performing concept erasure. KRaM fits a transformation of representations to match a specified distance measure (defined by a labeled concept to erase) using a modified rate-distortion function. Specifically, KRaM's objective function aims to make instances with similar concept labels dissimilar in the learned representation space while retaining other information. We find that optimizing KRaM effectively erases various types of concepts: categorical, continuous, and vector-valued variables from data representations across diverse domains. We also provide a theoretical analysis of several properties of KRaM's objective. To assess the quality of the learned representations, we propose an alignment score to evaluate their similarity with the original representation space. Additionally, we conduct experiments to showcase KRaM's efficacy in various settings, from erasing binary gender variables in word embeddings to vector-valued variables in GPT-3 representations.
分布式表示提供了一个向量空间,捕获了数据实例之间有意义的关系。然而,这些表示的分布式特性使多个数据实例的属性或概念(如文本的主题或情感,作者的年龄、性别等)纠缠在一起。最近的工作提出了概念消减任务,其中,通过消除一个概念,而不是使其可预测,目标是保留其他信息尽可能多的从原始表示空间中移除一个属性。在本文中,我们提出了一种新的基于距离度量学习的目标函数Kernelized Rate-Distortion Maximizer(KRaM)进行概念消减。KRaM通过使用修改的速率衰减函数将表示变换到指定距离度量的定义中来拟合表示。具体来说,KRaM的目标函数旨在在学习的表示空间中使具有相似概念标签的实例彼此不同,同时保留其他信息。我们发现,优化KRaM有效地消除了各种类型的概念:分类、连续和向量值变量。我们还提供了KRaM目标函数的一些理论分析。为了评估学习到的表示的质量,我们提出了一个与原始表示空间相一致的分数来评估它们的相似性。此外,我们还进行了实验,展示了KRaM在各种场景中的有效性,从消除word嵌入中的二分类性别变量到GPT-3表示中的向量值变量。
https://arxiv.org/abs/2312.00194
Turkish is one of the most popular languages in the world. Wide us of this language on social media platforms such as Twitter, Instagram, or Tiktok and strategic position of the country in the world politics makes it appealing for the social network researchers and industry. To address this need, we introduce TurkishBERTweet, the first large scale pre-trained language model for Turkish social media built using almost 900 million tweets. The model shares the same architecture as base BERT model with smaller input length, making TurkishBERTweet lighter than BERTurk and can have significantly lower inference time. We trained our model using the same approach for RoBERTa model and evaluated on two text classification tasks: Sentiment Classification and Hate Speech Detection. We demonstrate that TurkishBERTweet outperforms the other available alternatives on generalizability and its lower inference time gives significant advantage to process large-scale datasets. We also compared our models with the commercial OpenAI solutions in terms of cost and performance to demonstrate TurkishBERTweet is scalable and cost-effective solution. As part of our research, we released TurkishBERTweet and fine-tuned LoRA adapters for the mentioned tasks under the MIT License to facilitate future research and applications on Turkish social media. Our TurkishBERTweet model is available at: this https URL
土耳其是世界上使用最广泛的语之一。在Twitter、Instagram或Tiktok等社交媒体平台上广泛使用这种语言,以及土耳其在世界政治中的战略地位,使其对社交媒体研究员和产业具有吸引力。为满足这种需求,我们介绍了土耳其BERTweet,第一个基于几乎9亿条推文的土耳其社交媒体的大型预训练语言模型。该模型与基本BERT模型具有较小的输入长度,使得土耳其BERTweet比BERTurk更轻,可以在推理过程中显著降低。我们使用相同的方法对RoBERTa模型进行训练,并在两个文本分类任务上进行评估:情感分类和仇恨言论检测。我们证明,土耳其BERTweet在一般可解释性和较低的推理时间方面优于其他可用选项。我们还与商业OpenAI解决方案在成本和性能方面进行了比较,以证明土耳其BERTweet是一个可扩展和成本效益高的解决方案。作为我们的研究的一部分,我们在MIT许可证下发布了土耳其BERTweet,并对指定任务进行了微调,以促进未来对土耳其社交媒体的研究和应用。我们的土耳其BERTweet模型可以从以下链接获取:https://this URL
https://arxiv.org/abs/2311.18063
Artificial intelligence and machine learning have significantly bolstered the technological world. This paper explores the potential of transfer learning in natural language processing focusing mainly on sentiment analysis. The models trained on the big data can also be used where data are scarce. The claim is that, compared to training models from scratch, transfer learning, using pre-trained BERT models, can increase sentiment classification accuracy. The study adopts a sophisticated experimental design that uses the IMDb dataset of sentimentally labelled movie reviews. Pre-processing includes tokenization and encoding of text data, making it suitable for NLP models. The dataset is used on a BERT based model, measuring its performance using accuracy. The result comes out to be 100 per cent accurate. Although the complete accuracy could appear impressive, it might be the result of overfitting or a lack of generalization. Further analysis is required to ensure the model's ability to handle diverse and unseen data. The findings underscore the effectiveness of transfer learning in NLP, showcasing its potential to excel in sentiment analysis tasks. However, the research calls for a cautious interpretation of perfect accuracy and emphasizes the need for additional measures to validate the model's generalization.
人工智能和机器学习在很大程度上推动了科技发展。本文主要探讨自然语言处理中迁移学习的潜力,重点关注情感分析。使用大数据训练的模型也可以在没有数据的情况下使用。论文认为,与从头训练模型相比,使用预训练的BERT模型进行迁移学习可以提高情感分类准确性。研究采用了一种复杂的实验设计,使用了情感标注的电影评论的IMDb数据集。预处理包括对文本数据的分词和编码,使其适合自然语言处理模型。数据集应用于基于BERT的模型,通过准确性来衡量其性能。结果表明,准确率为100%。尽管完整的准确性可能会令人印象深刻,但它可能是过拟合或泛化不足的结果。需要进一步分析以确保模型能够处理多样化和未见过的数据。研究结果强调了迁移学习在自然语言处理中的有效性,展示了它在情感分析任务中取得优异表现的潜力。然而,研究呼吁对完美准确度的谨慎解释,并强调需要额外的措施来验证模型的泛化能力。
https://arxiv.org/abs/2311.16965
Product reviews often contain a large number of implicit aspects and object-attribute co-existence cases. Unfortunately, many existing studies in Aspect-Based Sentiment Analysis (ABSA) have overlooked this issue, which can make it difficult to extract opinions comprehensively and fairly. In this paper, we propose a new task called Entity-Aspect-Opinion-Sentiment Quadruple Extraction (EASQE), which aims to hierarchically decompose aspect terms into entities and aspects to avoid information loss, non-exclusive annotations, and opinion misunderstandings in ABSA tasks. To facilitate research in this new task, we have constructed four datasets (Res14-EASQE, Res15-EASQE, Res16-EASQE, and Lap14-EASQE) based on the SemEval Restaurant and Laptop datasets. We have also proposed a novel two-stage sequence-tagging based Trigger-Opinion framework as the baseline for the EASQE task. Empirical evaluations show that our Trigger-Opinion framework can generate satisfactory EASQE results and can also be applied to other ABSA tasks, significantly outperforming state-of-the-art methods. We have made the four datasets and source code of Trigger-Opinion publicly available to facilitate further research in this area.
产品评论通常包含大量的隐含方面和对象属性的共现情况。不幸的是,许多现有基于方面基于情感分析(ASSA)的研究都忽视了这个问题,这使得从全面和公平地提取意见变得困难。在本文中,我们提出了一个名为Entity-Aspect-Opinion-Sentiment Quadruple Extraction(EASQE)的新任务,旨在将方面词分解为实体和方面,以避免在ASSA任务中出现信息损失、非全面标注和意见误解。为了促进在这个新任务上的研究,我们基于SemEval餐厅和笔记本电脑数据集构建了四个数据集(Res14-EASQE,Res15-EASQE,Res16-EASQE和Lap14-EASQE)。我们还提出了一个新型的两阶段序列标记基于触发-意见框架作为EASQE任务的基准。 实证评估表明,我们的触发-意见框架可以生成令人满意的EASQE结果,并且可以应用于其他ASSA任务,显著优于最先进的方法。我们将四个数据集和触发-意见代码公开发布,以促进该领域进一步的研究。
https://arxiv.org/abs/2311.16678
While performance of many text classification tasks has been recently improved due to Pre-trained Language Models (PLMs), in this paper we show that they still suffer from a performance gap when the underlying distribution of topics changes. For example, a genre classifier trained on \textit{political} topics often fails when tested on documents about \textit{sport} or \textit{medicine}. In this work, we quantify this phenomenon empirically with a large corpus and a large set of topics. Consequently, we verify that domain transfer remains challenging both for classic PLMs, such as BERT, and for modern large models, such as GPT-3. We also suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50\% for some topics, nearing on-topic training results, while others show little to no improvement. While our empirical results focus on genre classification, our methodology is applicable to other classification tasks such as gender, authorship, or sentiment classification. The code and data to replicate the experiments are available at this https URL
虽然由于预训练语言模型(PLMs)的性能最近得到了提高,但当主题分布发生变化时,它们仍然会面临性能差距。例如,在政治主题上进行训练的 genres 分类器在测试体育或医疗方面的文档时常常表现不佳。在我们的研究中,我们用大量的语料库和主题集来定量这一现象。结果表明,对于经典 PLMs(如 BERT)和现代大型模型(如 GPT-3),领域迁移仍然具有挑战性。我们还提出了一个可能的解决方法,并在一些主题上进行了实验,结果表明,通过控制主题的 synthetic 文本,F1 分数可以达到提高至50\%的效果,接近主题训练的结果,而其他主题则没有或几乎没有改善。虽然我们的实证结果集中关注于 genres,但我们的方法可以应用于其他分类任务,如性别、作者或情感分类。实验代码和数据可在此处复制:https:// URL。
https://arxiv.org/abs/2311.16083
Social media play a significant role in shaping public opinion and influencing ideological communities through information propagation. Our demo InfoPattern centers on the interplay between language and human ideology. The demo (Code: this https URL ) is capable of: (1) red teaming to simulate adversary responses from opposite ideology communities; (2) stance detection to identify the underlying political sentiments in each message; (3) information propagation graph discovery to reveal the evolution of claims across various communities over time. (Live Demo: this https URL )
社交媒体在塑造公众舆论和影响意识形态社区方面发挥着重要作用。我们的演示InfoPattern重点关注语言与人类意识形态之间的相互作用。演示(代码:此链接)能够: 1. 红色团队模拟来自相反意识形态社区的对抗性回应; 2. 立场检测以识别每条消息中的潜在政治情感; 3. 信息传播图发现,以揭示随时间各种社区内主张的演变。 (实时演示:此链接)
https://arxiv.org/abs/2311.15642
Our research focuses on the crucial challenge of discerning text produced by Large Language Models (LLMs) from human-generated text, which holds significance for various applications. With ongoing discussions about attaining a model with such functionality, we present supporting evidence regarding the feasibility of such models. We evaluated our models on multiple datasets, including Twitter Sentiment, Football Commentary, Project Gutenberg, PubMedQA, and SQuAD, confirming the efficacy of the enhanced detection approaches. These datasets were sampled with intricate constraints encompassing every possibility, laying the foundation for future research. We evaluate GPT-3.5-Turbo against various detectors such as SVM, RoBERTa-base, and RoBERTa-large. Based on the research findings, the results predominantly relied on the sequence length of the sentence.
我们的研究重点在于从大型语言模型(LLMs)生成的文本中分辨出人类生成的文本,这对各种应用具有重大意义。随着关于实现具有这种功能的模型的持续讨论,我们提供了关于这种模型的可行性的支持证据。我们在多个数据集上评估了我们的模型,包括Twitter情感、足球解说、Project Gutenberg、PubMedQA和SQuAD,证实了增强检测方法的效力。这些数据集都是用各种限制性的约束条件进行抽样的,为未来的研究奠定了基础。我们评估了GPT-3.5-Turbo与各种检测器(如SVM、RoBERTa-base和RoBERTa-large)的对抗性。根据研究结果,主要依赖于句子的序列长度。
https://arxiv.org/abs/2311.15425
The application of Machine learning to finance has become a familiar approach, even more so in stock market forecasting. The stock market is highly volatile and huge amounts of data are generated every minute globally. The extraction of effective intelligence from this data is of critical importance. However, a collaboration of numerical stock data with qualitative text data can be a challenging task. In this work, we accomplish this and provide an unprecedented, publicly available dataset with technical and fundamental data, sentiment that we gathered from News Archives, TV news captions, Radio Transcripts, Tweets, Daily financial newspapers, etc. The text data entries used for sentiment extraction total more than 1.4 Million. The dataset comprises of daily entries from January 2018 to December 2022 for 8 different companies and Dow Jones Index as a whole. Holistic Fundamental and Technical data is provided training ready for Model learning and deployment. The predictive power of deep learning models is highly determined by the training data provided. This dataset would be of benefit for research globally incorporating qualitative intelligence for stock market forecasting. The dataset is made available at this https URL.
将机器学习应用于金融领域已经成为一种熟悉的做法,尤其是在股票市场预测方面。股票市场具有高度波动性,全球范围内每分钟都会产生大量数据。从这些数据中有效地提取信息具有关键重要性。然而,将数值股票数据与定性文本数据进行合作可能是一个具有挑战性的任务。在这项工作中,我们取得了这个前所未有的、公开可用的数据集,包括技术数据和基本数据,以及我们从新闻档案、电视新闻标题、收音机转录、推特、每日金融报纸等处收集到的情绪信息。定性数据条目数超过140万。数据集包括2018年1月至今8家不同公司和道琼斯指数的整体年度数据。提供给模型的全面基本和技术数据为模型学习和部署做好准备。深度学习模型的预测能力取决于提供的训练数据。这个数据集对于全球范围的股票市场预测研究都具有很大的价值。数据集在https://这个链接上可用。
https://arxiv.org/abs/2311.15218
The impact of non-deterministic outputs from Large Language Models (LLMs) is not well examined for financial text understanding tasks. Through a compelling case study on investing in the US equity market via news sentiment analysis, we uncover substantial variability in sentence-level sentiment classification results, underscoring the innate volatility of LLM outputs. These uncertainties cascade downstream, leading to more significant variations in portfolio construction and return. While tweaking the temperature parameter in the language model decoder presents a potential remedy, it comes at the expense of stifled creativity. Similarly, while ensembling multiple outputs mitigates the effect of volatile outputs, it demands a notable computational investment. This work furnishes practitioners with invaluable insights for adeptly navigating uncertainty in the integration of LLMs into financial decision-making, particularly in scenarios dictated by non-deterministic information.
大语言模型(LLMs)非确定性输出的影响在金融文本理解任务中并没有得到很好的研究。通过一个引人入胜的案例研究,我们发现句子级情感分类结果的句子级情感存在很大的变异性,凸显了LLM输出的固有波动性。这些不确定性沿着下游传导,导致投资组合构建和回报的差异更加显著。虽然调整语言模型解码器的温度参数是一个潜在的解决方案,但以限制创造性为代价。同样,将多个输出进行集成可以减轻波动性输出的影响,但这需要明显的计算投入。这项工作为实践者提供了宝贵的经验,以便在将LLM集成到金融决策过程中更好地处理不确定性,尤其是在由非确定性信息决定的场景中。
https://arxiv.org/abs/2311.15180
When dealing with text data containing subjective labels like speaker emotions, inaccuracies or discrepancies among labelers are not uncommon. Such discrepancies can significantly affect the performance of machine learning algorithms. This study investigates the potential of identifying and addressing outliers in text data with subjective labels, aiming to enhance classification outcomes. We utilized the Deep SVDD algorithm, a one-class classification method, to detect outliers in nine text-based emotion and sentiment analysis datasets. By employing both a small-sized language model (DistilBERT base model with 66 million parameters) and non-deep learning machine learning algorithms (decision tree, KNN, Logistic Regression, and LDA) as the classifier, our findings suggest that the removal of outliers can lead to enhanced results in most cases. Additionally, as outliers in such datasets are not necessarily unlearnable, we experienced utilizing a large language model -- DeBERTa v3 large with 131 million parameters, which can capture very complex patterns in data. We continued to observe performance enhancements across multiple datasets.
在处理包含主观标签(如说话者情感)的数据时,不常见的不准确或标签者之间的差异。这些差异可能会显著影响机器学习算法的性能。本研究旨在调查在文本数据中识别和解决具有主观标签的异常情况,以提高分类结果。我们使用了Deep SVDD算法,一种一类别分类方法,来检测九个基于文本的情感和态度分析数据集中的异常情况。通过使用既包括小型语言模型(具有6600万参数的DistilBERT基础模型),也包括非深度学习机器学习算法(决策树、KNN、逻辑回归和LDA)作为分类器,我们得出的结论是,在大多数情况下,消除异常可以提高结果。此外,由于这些数据集中的异常情况并不一定无法学习,我们使用了一个大型语言模型--DeBERTa v3 large,具有13100万参数,可以捕捉数据中非常复杂模式。我们继续在多个数据集中观察到性能提升。
https://arxiv.org/abs/2311.16185
In this paper, we discuss the nlpBDpatriots entry to the shared task on Sentiment Analysis of Bangla Social Media Posts organized at the first workshop on Bangla Language Processing (BLP) co-located with EMNLP. The main objective of this task is to identify the polarity of social media content using a Bangla dataset annotated with positive, neutral, and negative labels provided by the shared task organizers. Our best system for this task is a transfer learning approach with data augmentation which achieved a micro F1 score of 0.71. Our best system ranked 12th among 30 teams that participated in the competition.
在本文中,我们讨论了nlpBDpatriots在第一届Bangla语言处理(BLP)工作会上为情感分析Bangla社交媒体帖子组成的共享任务。这个任务的主要目标是使用由共享任务组织者提供的Bangla数据集上的正、中、负标签来确定社交媒体内容的极性。我们为此任务的最佳系统是一个数据增强的迁移学习方法,其微F1得分达到了0.71。我们的最佳系统在30支参赛队伍中排名第12。
https://arxiv.org/abs/2311.15032
Art created using generated Artificial Intelligence has taken the world by storm and generated excitement for many digital creators and technologists. However, the reception and reaction from artists have been mixed. Concerns about plagiarizing their artworks and styles for datasets and uncertainty around the future of digital art sparked movements in artist communities shunning the use of AI for generating art and protecting artists' rights. Collaborating with these tools for novel creative use cases also sparked hope from some creators. Artists are an integral stakeholder in the rapidly evolving digital creativity industry and understanding their concerns and hopes inform responsible development and use of creativity support tools. In this work, we study artists' sentiments about AI-generated art. We interviewed 7 artists and analyzed public posts from artists on social media platforms Reddit, Twitter and Artstation. We report artists' main concerns and hopes around AI-generated artwork, informing a way forward for inclusive development of these tools.
使用生成人工智能创作的艺术品引起了全球范围内的轰动,并为许多数字创作者和技术人员带来了热情。然而,艺术家的反应和态度却是复杂的。对于担心自己的作品被盗用和数字化艺术风格的抄袭,以及未来数字艺术作品版权的不确定性,艺术家社群开始抵制使用人工智能生成艺术和保护艺术家权利。与这些工具进行新颖的创作应用也引起了部分创作者的期望。艺术家是数字创意产业迅速发展的关键利益相关者,理解他们的担忧和期望对于创建负责任的创意支持工具至关重要。在这项研究中,我们研究了艺术家对人工智能生成艺术的看法。我们在社交媒体平台Reddit、Twitter和Artstation上采访了7位艺术家,并分析了他们在这些平台上的公共帖子。我们报道了艺术家对人工智能生成作品的主要担忧和期望,为这些工具的包容性发展提供了建议。
https://arxiv.org/abs/2311.13725
Multimodal Sentiment Analysis (MSA) has recently become a centric research direction for many real-world applications. This proliferation is due to the fact that opinions are central to almost all human activities and are key influencers of our behaviors. In addition, the recent deployment of Deep Learning-based (DL) models has proven their high efficiency for a wide range of Western languages. In contrast, Arabic DL-based multimodal sentiment analysis (MSA) is still in its infantile stage due, mainly, to the lack of standard datasets. % The contribution In this paper, our investigation is twofold. First, we design a pipeline that helps building our Arabic Multimodal dataset leveraging both state-of-the-art transformers and feature extraction tools within word alignment techniques. Thereafter, we validate our dataset using state-of-the-art transformer-based model dealing with multimodality. Despite the small size of the outcome dataset, experiments show that Arabic multimodality is very promising.
多模态情感分析(MSA)最近成为许多现实应用的焦点研究方向。这种扩散是因为意见在几乎所有人类活动中都至关重要,是影响我们行为的關鍵因素。此外,基於深度学习的(DL)模型的 recent deployment 已經證明它們對各種 Western 語言的高效率。相反,阿拉伯文 DL-based multimodal sentiment analysis (MSA) 仍然处于幼稚阶段,主要原因是缺乏標準數據集。% 在本文中,我們的調查是雙重的。首先,我們設計了一個數據處理管道,利用最先进的 transformers 和詞對齊技術來構建我們的阿拉伯多模态數據集。然後,我們使用基於多模态的狀態轉移模型來驗證我們的數據集。儘管結果數據集的規模很小,實驗結果表明阿拉伯多模态非常有前景。
https://arxiv.org/abs/2311.12986
This paper describes the system of the LowResource Team for Task 2 of BLP-2023, which involves conducting sentiment analysis on a dataset composed of public posts and comments from diverse social media platforms. Our primary aim is to utilize BanglaBert, a BERT model pre-trained on a large Bangla corpus, using various strategies including fine-tuning, dropping random tokens, and using several external datasets. Our final model is an ensemble of the three best BanglaBert variations. Our system has achieved overall 3rd in the Test Set among 30 participating teams with a score of 0.718. Additionally, we discuss the promising systems that didn't perform well namely task-adaptive pertaining and paraphrasing using BanglaT5. Training codes and external datasets which are used for our system are publicly available at this https URL
本文描述了 BLP-2023 竞赛任务 2 中的 LowResource 团队系统,该系统在对多样社交媒体平台上的公共帖子和国家评论进行情感分析时使用 BanglaBert。我们的主要目标是在包括微调、随机删除字符串和利用多个外部数据集的各种策略下,利用预训练的大巴达语语料库中的 BanglaBert。我们最终的模型是三个最佳 BanglaBert 变体的集成。我们的系统在 30 个参赛团队中的测试集总体排名第 3,得分为 0.718。此外,我们讨论了没有表现好的有前途的系统,即 task-adaptive 相关和 paraphrasing 使用 BanglaT5。用于我们系统的训练代码和外部数据集可公开访问,此处链接为 https://www.aclweb.org/anthology/N22-1196377 。
https://arxiv.org/abs/2311.12735
The rising influence of social media platforms in various domains, including tourism, has highlighted the growing need for efficient and automated natural language processing (NLP) approaches to take advantage of this valuable resource. However, the transformation of multilingual, unstructured, and informal texts into structured knowledge often poses significant challenges. In this work, we evaluate and compare few-shot, pattern-exploiting and fine-tuning machine learning techniques on large multilingual language models (LLMs) to establish the best strategy to address the lack of annotated data for 3 common NLP tasks in the tourism domain: (1) Sentiment Analysis, (2) Named Entity Recognition, and (3) Fine-grained Thematic Concept Extraction (linked to a semantic resource). Furthermore, we aim to ascertain the quantity of annotated examples required to achieve good performance in those 3 tasks, addressing a common challenge encountered by NLP researchers in the construction of domain-specific datasets. Extensive experimentation on a newly collected and annotated multilingual (French, English, and Spanish) dataset composed of tourism-related tweets shows that current few-shot learning techniques allow us to obtain competitive results for all three tasks with very little annotation data: 5 tweets per label (15 in total) for Sentiment Analysis, 10% of the tweets for location detection (around 160) and 13% (200 approx.) of the tweets annotated with thematic concepts, a highly fine-grained sequence labeling task based on an inventory of 315 classes. This comparative analysis, grounded in a novel dataset, paves the way for applying NLP to new domain-specific applications, reducing the need for manual annotations and circumventing the complexities of rule-based, ad hoc solutions.
社交媒体平台在旅游等领域的崛起,强调了利用这个有价值资源的高效和自动化的自然语言处理(NLP)方法的需求。然而,将多语言、无结构和非正式文本转化为结构化的知识通常会带来重大挑战。在这项工作中,我们评估并比较了在大型多语言语言模型(LLMs)上的几 shot、模式利用和微调机器学习技术,以确定在旅游领域解决缺乏标注数据三个常见NLP任务的最好策略:(1)情感分析,(2)命名实体识别,(3)细粒度主题概念提取(与语义资源相关)。此外,我们还试图确定实现这些三个任务良好性能所需的标注示例的数量,这是NLP研究人员在构建领域特定数据集时常常遇到的问题。在收集和标注旅游相关推特的大型多语言(法语、英语和西班牙语)数据集的实验中,我们发现,当前的几 shot学习技术可以几乎没有标注数据的情况下,获得对于所有三个任务的竞争力的结果:5个标签每条推特(总共15条),对于情感分析,10%的推特是位置检测(大约160条)以及13%(约200条)的推特标注了主题概念,一个基于315个类别的细粒度序列标注任务。基于这一新颖的数据集的比较分析,为将自然语言处理应用于新的领域专用应用程序铺平道路,减少了手动标注的需求,并绕过了基于规则的、随意解决方案的复杂性。
https://arxiv.org/abs/2311.14727
Large language models (LLMs) have showcased their capability with few-shot inference known as in-context learning. However, in-domain demonstrations are not always readily available in real scenarios, leading to cross-domain in-context learning. Besides, LLMs are still facing challenges in long-tail knowledge in unseen and unfamiliar domains. The above limitations demonstrate the necessity of Unsupervised Domain Adaptation (UDA). In this paper, we study the UDA problem under an in-context learning setting to adapt language models from the source domain to the target domain without any target labels. The core idea is to retrieve a subset of cross-domain elements that are the most similar to the query, and elicit language model to adapt in an in-context manner by learning both target domain distribution and the discriminative task signal simultaneously with the augmented cross-domain in-context examples. We devise different prompting and training strategies, accounting for different LM architectures to learn the target distribution via language modeling. With extensive experiments on Sentiment Analysis (SA) and Named Entity Recognition (NER) tasks, we thoroughly study the effectiveness of ICL for domain transfer and demonstrate significant improvements over baseline models.
大语言模型(LLMs)通过少样本推理展示了其能力,这被称为在上下文学习中。然而,在现实场景中,在领域内演示并不总是 readily可用的,导致跨领域在上下文学习中。此外,LLMs在未见过的陌生领域中仍面临挑战。上述限制表明了无需监督领域自适应(UDA)的必要性。在本文中,我们在上下文学习环境中研究了UDA问题,将语言模型从源领域迁移到目标领域,同时不需要任何目标标签。核心思想是检索出跨领域元素中与查询最相似的部分,并通过对增强上下文域例子的同时学习目标领域分布和判别任务信号,激发语言模型以上下文方式进行迁移。我们设计了一系列提示和训练策略,考虑了不同LM架构,通过语言建模学习目标分布。通过针对情感分析和命名实体识别(NER)任务的广泛实验,我们深入研究了ICL在领域迁移方面的效果,并证明了与基线模型相比具有显著的改进。
https://arxiv.org/abs/2311.11551
In March 2020, the World Health Organisation declared COVID-19 a global pandemic as it spread to nearly every country. By mid-2021, India had introduced three vaccines: Covishield, Covaxin, and Sputnik. To ensure successful vaccination in a densely populated country like India, understanding public sentiment was crucial. Social media, particularly Reddit with over 430 million users, played a vital role in disseminating information. This study employs data mining techniques to analyze Reddit data and gauge Indian sentiments towards COVID-19 vaccines. Using Python's Text Blob library, comments are annotated to assess general sentiments. Results show that most Reddit users in India expressed neutrality about vaccination, posing a challenge for the Indian government's efforts to vaccinate a significant portion of the population.
2020年3月,世界卫生组织宣布将COVID-19定位为全球大流行,因为它传播到几乎所有国家。到2021年中期,印度已经推出了三种疫苗:Covishield,Covaxin和Sputnik。为了在像印度这样人口密集的国家成功接种疫苗,理解公众情绪至关重要。社交媒体,特别是Reddit,在传播信息方面发挥了重要作用。这项研究利用数据挖掘技术分析Reddit数据并衡量印度人对COVID-19疫苗的态度。通过Python的Text Blob库,评论被标注以评估总体情绪。结果显示,印度大部分Reddit用户对接种疫苗持中立态度,这对印度政府努力给大量人口接种疫苗构成了挑战。
https://arxiv.org/abs/2311.11435
Sentiment analysis (SA) is an emerging field in text mining. It is the process of computationally identifying and categorizing opinions expressed in a piece of text over different social media platforms. Social media plays an essential role in knowing the customer mindset towards a product, services, and the latest market trends. Most organizations depend on the customer's response and feedback to upgrade their offered products and services. SA or opinion mining seems to be a promising research area for various domains. It plays a vital role in analyzing big data generated daily in structured and unstructured formats over the internet. This survey paper defines sentiment and its recent research and development in different domains, including voice, images, videos, and text. The challenges and opportunities of sentiment analysis are also discussed in the paper. \keywords{Sentiment Analysis, Machine Learning, Lexicon-based approach, Deep Learning, Natural Language Processing}
情感分析(SA)是文本挖掘领域的一个新兴领域。它是一种通过计算识别和分类互联网上文本中表达的观点的过程。社交媒体在了解客户对产品、服务和最新市场趋势的心态方面起着重要作用。大多数组织都依赖客户对产品和服务的反馈来升级它们提供的产品和服务。SA或意见挖掘似乎是一个有前景的研究领域,涉及多个领域。在互联网上产生的大数据的结构化和非结构化形式每天都在各个领域中分析。本文定义情感及其在不同的领域中的最近研究和开发,包括语音、图像、视频和文本。文章还讨论了情感分析的挑战和机遇。\keywords{情感分析,机器学习,词汇驱动方法,深度学习,自然语言处理}
https://arxiv.org/abs/2311.11250
A multi-modal emotion recognition method was established by combining two-channel convolutional neural network with ring network. This method can extract emotional information effectively and improve learning efficiency. The words were vectorized with GloVe, and the word vector was input into the convolutional neural network. Combining attention mechanism and maximum pool converter BiSRU channel, the local deep emotion and pre-post sequential emotion semantics are obtained. Finally, multiple features are fused and input as the polarity of emotion, so as to achieve the emotion analysis of the target. Experiments show that the emotion analysis method based on feature fusion can effectively improve the recognition accuracy of emotion data set and reduce the learning time. The model has a certain generalization.
一种多模态情感识别方法通过将两通道卷积神经网络与环网络相结合来建立。这种方法能够有效地提取情感信息并提高学习效率。单词使用GloVe进行向量化,并将单词向量输入到卷积神经网络中。通过结合注意机制和最大池化器BiSRU通道,获得了局部深度情感和预序列情感语义。最后,将多个特征进行融合并输入作为情感的极性,以实现目标情感分析。实验结果表明,基于特征融合的情感分析方法可以有效提高情感数据集的识别准确率,并减少学习时间。该模型具有一定的泛化能力。
https://arxiv.org/abs/2311.11237
When traveling to an unfamiliar city for holidays, tourists often rely on guidebooks, travel websites, or recommendation systems to plan their daily itineraries and explore popular points of interest (POIs). However, these approaches may lack optimization in terms of time feasibility, localities, and user preferences. In this paper, we propose the SBTRec algorithm: a BERT-based Trajectory Recommendation with sentiment analysis, for recommending personalized sequences of POIs as itineraries. The key contributions of this work include analyzing users' check-ins and uploaded photos to understand the relationship between POI visits and distance. We introduce SBTRec, which encompasses sentiment analysis to improve recommendation accuracy by understanding users' preferences and satisfaction levels from reviews and comments about different POIs. Our proposed algorithms are evaluated against other sequence prediction methods using datasets from 8 cities. The results demonstrate that SBTRec achieves an average F1 score of 61.45%, outperforming baseline algorithms. The paper further discusses the flexibility of the SBTRec algorithm, its ability to adapt to different scenarios and cities without modification, and its potential for extension by incorporating additional information for more reliable predictions. Overall, SBTRec provides personalized and relevant POI recommendations, enhancing tourists' overall trip experiences. Future work includes fine-tuning personalized embeddings for users, with evaluation of users' comments on POIs,~to further enhance prediction accuracy.
在度假时前往不熟悉的城市,游客通常会依赖指南书、旅游网站或推荐系统来规划他们的每日行程和探索热门景点(POIs)。然而,这些方法在时间可行性、当地性和用户偏好方面可能缺乏优化。在本文中,我们提出了SBTRec算法:一种基于BERT的轨迹推荐算法,用于推荐个性化的景点序列作为行程。本工作的关键贡献包括通过分析用户的点评和上传的照片,理解景点参观与距离之间的关系。我们引入了SBTRec,它包括情感分析,以理解用户对不同景点的喜好程度。我们的算法通过对来自8个城市的大型数据集进行评估,与其他序列预测方法进行比较。结果表明,SBTRec的平均F1得分达到61.45%,超过了基线算法。本文还讨论了SBTRec算法的灵活性,其无需修改即可适应不同场景和城市的能力,以及通过增加额外信息实现更可靠预测的潜力。总的来说,SBTRec为游客提供了个性化和相关的景点推荐,提高了他们的整体旅行体验。未来工作包括对用户进行情感化微调,评估用户对景点的评论,以进一步提高预测准确性。
https://arxiv.org/abs/2311.11071