This manuscript presents a methodical examination of the utilization of Artificial Intelligence in the assessment of emotions in texts related to healthcare, with a particular focus on the incorporation of Natural Language Processing and deep learning technologies. We scrutinize numerous research studies that employ AI to augment sentiment analysis, categorize emotions, and forecast patient outcomes based on textual information derived from clinical narratives, patient feedback on medications, and online health discussions. The review demonstrates noteworthy progress in the precision of algorithms used for sentiment classification, the prognostic capabilities of AI models for neurodegenerative diseases, and the creation of AI-powered systems that offer support in clinical decision-making. Remarkably, the utilization of AI applications has exhibited an enhancement in personalized therapy plans by integrating patient sentiment and contributing to the early identification of mental health disorders. There persist challenges, which encompass ensuring the ethical application of AI, safeguarding patient confidentiality, and addressing potential biases in algorithmic procedures. Nevertheless, the potential of AI to revolutionize healthcare practices is unmistakable, offering a future where healthcare is not only more knowledgeable and efficient but also more empathetic and centered around the needs of patients. This investigation underscores the transformative influence of AI on healthcare, delivering a comprehensive comprehension of its role in examining emotional content in healthcare texts and highlighting the trajectory towards a more compassionate approach to patient care. The findings advocate for a harmonious synergy between AI's analytical capabilities and the human aspects of healthcare.
本文对将人工智能(AI)应用于评估文本中情感的方法进行了系统审查,特别关注将自然语言处理(NLP)和深度学习技术应用于此目的。我们详细审查了使用AI增强情感分析、分类情感和预测患者结果的研究。评论表明,用于情感分类的算法的精确度、AI模型对神经退行性疾病的有预测能力以及基于AI的系统的临床决策支持方面的进展是显著的。值得注意的是,AI应用在个性化治疗计划方面的使用已经通过将患者情感融入其中,帮助早期识别心理健康障碍而表现出增强。仍然存在一些挑战,包括确保AI应用的伦理应用、保护患者隐私以及解决算法过程中的偏见。然而,AI在医疗保健实践中的潜在革命性变革是不容忽视的,为未来提供了一个更具有知识和效率的医疗保健体系,同时也更加关注患者的需要。这次调查突显了AI在医疗保健中的 transformative 影响,全面阐述了其在检查医疗保健文本情感内容方面以及在患者护理过程中更富有同情心的趋势。研究结果主张在AI的分析能力与人类医疗保健方面实现和谐协同。
https://arxiv.org/abs/2403.09762
As more than 70$\%$ of reviews in the existing opinion summary data set are positive, current opinion summarization approaches are reluctant to generate negative summaries given the input of negative texts. To address such sentiment bias, a direct approach without the over-reliance on a specific framework is to generate additional data based on large language models to balance the emotional distribution of the dataset. However, data augmentation based on large language models faces two disadvantages: 1) the potential issues or toxicity in the augmented data; 2) the expensive costs. Therefore, in this paper, we propose a novel data augmentation framework based on both large and small language models for debiasing opinion summarization. In specific, a small size of synthesized negative reviews is obtained by rewriting the positive text via a large language model. Then, a disentangle reconstruction model is trained based on the generated data. After training, a large amount of synthetic data can be obtained by decoding the new representation obtained from the combination of different sample representations and filtering based on confusion degree and sentiment classification. Experiments have proved that our framework can effectively alleviate emotional bias same as using only large models, but more economically.
由于现有观点总结数据集中的超过70%好评,现有的观点总结方法不愿意根据负面文本生成负面摘要。为了解决这种情感偏见,一种不依赖特定框架的直接方法是根据大型语言模型生成额外数据来平衡数据集的情感分布。然而,基于大型语言模型的数据增强存在两个缺点:1)增强数据的潜在问题或毒性;2)昂贵的成本。因此,在本文中,我们提出了一个基于大型和小型语言模型的观点总结去偏新方法。具体来说,通过大型语言模型重新编写积极文本可以获得小的负面评论数量。然后,基于生成的数据训练解离重构模型。训练后,可以通过解码不同样本表示的组合以及根据混淆程度和情感分类进行过滤来获得大量合成数据。实验证明,我们的框架可以有效地消除仅使用大型模型时存在的情感偏见,而且更加经济实惠。
https://arxiv.org/abs/2403.07693
The lack of a suitable tool for the analysis of conversational texts in the Persian language has made various analyses of these texts, including Sentiment Analysis, difficult. In this research, we tried to make the understanding of these texts easier for the machine by providing PSC, Persian Slang Converter, a tool for converting conversational texts into formal ones, and by using the most up-to-date and best deep learning methods along with the PSC, the sentiment learning of short Persian language texts for the machine in a better way. be made More than 10 million unlabeled texts from various social networks and movie subtitles (as Conversational texts) and about 10 million news texts (as formal texts) have been used for training unsupervised models and formal implementation of the tool. 60,000 texts from the comments of Instagram social network users with positive, negative, and neutral labels are considered supervised data for training the emotion classification model of short texts. Using the formal tool, 57% of the words of the corpus of conversation were converted. Finally, by using the formalizer, FastText model, and deep LSTM network, an accuracy of 81.91 was obtained on the test data.
波斯语对话文本的分析缺乏适当的工具,包括情感分析,使得各种分析变得困难。在这项研究中,我们试图通过提供PSC(波斯语俚语转换器)、一个将对话文本转换为正式文本的工具,以及使用最先进的和最优秀的深度学习方法和PSC,更好地理解这些文本,使得机器更容易理解。已经使用了超过1000万无标签的社交媒体文本和电影字幕(作为对话文本)以及大约1000万正式文本(作为正式文本)进行训练,并正式发布了该工具。60,000篇来自Instagram社交网络用户正面、负面和中立标签的文本被认为是训练短文本情感分类模型的有监督数据。使用正式工具,将数据集的57%的单词转换为正式文本。最后,通过使用正式化器、FastText模型和深度LSTM网络,在测试数据上获得了81.91%的准确率。
https://arxiv.org/abs/2403.06023
Introduction: Microblogging websites have massed rich data sources for sentiment analysis and opinion mining. In this regard, sentiment classification has frequently proven inefficient because microblog posts typically lack syntactically consistent terms and representatives since users on these social networks do not like to write lengthy statements. Also, there are some limitations to low-resource languages. The Persian language has exceptional characteristics and demands unique annotated data and models for the sentiment analysis task, which are distinctive from text features within the English dialect. Method: This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way. Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram. Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts. The constructed datasets are used to evaluate the presented architecture. Furthermore, some models, such as LSTM, CNN-RNN, BiLSTM, and BiGRU with different word embeddings, including Fasttext, Glove, and Word2vec, investigated our dataset and evaluated the results. Results: The results demonstrate the benefit of our dataset and the proposed model (72% accuracy), displaying meaningful improvement in sentiment classification performance.
简介:微博网站聚集了大量的情感分析和意见挖掘数据资源。在这方面,情感分类常常因为微博帖子通常缺乏句法一致的词汇和代表而证明效率低下。此外,低资源语言也存在一些限制。波斯语具有独特的特点,需要为情感分析任务提供独特的注释数据和模型,这与英式英语方言中的文本特征不同。方法:本文首先在一个合作和资源的环境中构建了一个用户意见数据集 called ITRC-Opinion。我们的数据集包含来自推特和Instagram等社交微博的60,000个非正式和俚语波斯语文本。接着,本研究提出了一种基于卷积神经网络(CNN)模型的新的架构,以更有效地分析社交微博中的流行文本的情感。构建的数据集用于评估所提出的架构。此外,一些模型,如LSTM、CNN-RNN、BiLSTM和BiGRU,使用不同的词向量,包括Fasttext、Glove和Word2vec,对数据集进行了调查并评估了结果。结果:结果表明,我们的数据集和所提出的模型的价值(72%的准确性),在情感分类性能上具有显著的提高。
https://arxiv.org/abs/2306.12679
In this study, ChatGPT is utilized to create streamlined models that generate easily interpretable features. These features are then used to evaluate financial outcomes from earnings calls. We detail a training approach that merges knowledge distillation and transfer learning, resulting in lightweight topic and sentiment classification models without significant loss in accuracy. These models are assessed through a dataset annotated by experts. The paper also delves into two practical case studies, highlighting how the generated features can be effectively utilized in quantitative investing scenarios.
在这项研究中,我们使用ChatGPT来创建具有清晰可解释特性的简化模型。这些特性然后用于从电话会议中评估财务结果。我们详细介绍了一种结合知识蒸馏和迁移学习的方法,导致没有显著准确度损失的轻量级主题和情感分类模型。这些模型通过由专家标注的数据集进行评估。此外,本文还深入研究了两个实际案例,阐明生成的特征如何有效地用于量化投资场景。
https://arxiv.org/abs/2403.02185
In this paper we explore the challenges of measuring sentiment in relation to Environmental, Social and Governance (ESG) social media. ESG has grown in importance in recent years with a surge in interest from the financial sector and the performance of many businesses has become based in part on their ESG related reputations. The use of sentiment analysis to measure ESG related reputation has developed and with it interest in the use of machines to do so. The era of digital media has created an explosion of new media sources, driven by the growth of social media platforms. This growing data environment has become an excellent source for behavioural insight studies across many disciplines that includes politics, healthcare and market research. Our study seeks to compare human performance with the cutting edge in machine performance in the measurement of ESG related sentiment. To this end researchers classify the sentiment of 150 tweets and a reliability measure is made. A gold standard data set is then established based on the consensus of 3 researchers and this data set is then used to measure the performance of different machine approaches: one based on the VADER dictionary approach to sentiment classification and then multiple language model approaches, including Llama2, T5, Mistral, Mixtral, FINBERT, GPT3.5 and GPT4.
在本文中,我们探讨了在环境、社会和治理(ESG)社交媒体上衡量情感的挑战。近年来,ESG的重要性随着金融部门兴趣的增加和许多企业的业绩部分基于其ESG相关声誉而增加。使用情感分析来衡量ESG相关声誉的发展,以及机器在这方面应用的兴趣不断增加。数字媒体时代的爆炸性增长带来了大量新的媒体来源,主要由社交媒体平台的增长推动。这个不断增长的数据环境已经成为许多学科领域行为洞察力研究的重要来源,包括政治、医疗和市场研究。我们的研究旨在比较人类表现与机器在测量ESG相关情感方面的尖端表现。为此,研究人员将150条推文的情绪进行了分类,并进行了可靠性度量。然后,基于三名研究人员的共识,建立了一个黄金标准数据集。接着,将这个数据集用于衡量不同机器方法的性能:基于VADER词典方法的情绪分类和多种语言模型方法,包括Llama2、T5、Mistral、Mixtral、FINBERT、GPT3.5和GPT4。
https://arxiv.org/abs/2402.16650
This paper explores the challenges posed by aspect-based sentiment classification (ABSC) within pretrained language models (PLMs), with a particular focus on contextualization and hallucination issues. In order to tackle these challenges, we introduce CARBD-Ko (a Contextually Annotated Review Benchmark Dataset for Aspect-Based Sentiment Classification in Korean), a benchmark dataset that incorporates aspects and dual-tagged polarities to distinguish between aspect-specific and aspect-agnostic sentiment classification. The dataset consists of sentences annotated with specific aspects, aspect polarity, aspect-agnostic polarity, and the intensity of aspects. To address the issue of dual-tagged aspect polarities, we propose a novel approach employing a Siamese Network. Our experimental findings highlight the inherent difficulties in accurately predicting dual-polarities and underscore the significance of contextualized sentiment analysis models. The CARBD-Ko dataset serves as a valuable resource for future research endeavors in aspect-level sentiment classification.
本文探讨了在预训练语言模型(PLMs)中,面向 aspect 的情感分类(ASBC)所面临的挑战,特别是上下文理解和虚构问题。为解决这些挑战,我们引入了 CARBD-Ko(一种关注于具有上下文注释的韩国 aspect 情感分类基准数据集),作为具有 aspect 和双重标记极性的基准数据集,用于区分面向特定和面向无关情感分类。数据集包括带有特定 aspects、aspect 极性、aspect-agnostic 极性和极值强度的句子注释。为了应对双重标记极性的问题,我们提出了一个新的采用 Siamese 网络的方法。我们的实验结果突出了准确预测双重极性的固有困难,并强调了上下文情感分析模型的必要性。CARBD-Ko 数据集成为未来研究在 aspect 级别情感分类方面的重要资源。
https://arxiv.org/abs/2402.15046
In the rapidly evolving landscape of social media, the introduction of new emojis in Unicode release versions presents a structured opportunity to explore digital language evolution. Analyzing a large dataset of sampled English tweets, we examine how newly released emojis gain traction and evolve in meaning. We find that community size of early adopters and emoji semantics are crucial in determining their popularity. Certain emojis experienced notable shifts in the meanings and sentiment associations during the diffusion process. Additionally, we propose a novel framework utilizing language models to extract words and pre-existing emojis with semantically similar contexts, which enhances interpretation of new emojis. The framework demonstrates its effectiveness in improving sentiment classification performance by substituting unknown new emojis with familiar ones. This study offers a new perspective in understanding how new language units are adopted, adapted, and integrated into the fabric of online communication.
在社交媒体快速发展的环境中,Unicode 发布版本中引入新的 emoji 提供了一个有结构的机会来探讨数字语言演变。分析了一个大量的英文推特样本数据集,我们研究了的新发布的 emoji 如何获得关注并演变其意义。我们发现,社区规模和 emoji 语义是决定其流行度的重要因素。在扩散过程中,某些 emoji 的意义和情感关联经历了一些显著的变化。此外,我们提出了一种利用语言模型提取具有相似语义上下文的单词和预先存在的 emoji 的框架,这有助于解释新 emoji 的含义。该框架通过用熟悉的词汇替换未知的新 emoji 来提高情感分类性能,证明了其有效性。本研究提供了一种新的理解,即新的语言单位是如何被采用、适应和融入在线交流的织物。
https://arxiv.org/abs/2402.14187
Aspect-Based Sentiment Analysis (ABSA) is a fine-grained linguistics problem that entails the extraction of multifaceted aspects, opinions, and sentiments from the given text. Both standalone and compound ABSA tasks have been extensively used in the literature to examine the nuanced information present in online reviews and social media posts. Current ABSA methods often rely on static hyperparameters for attention-masking mechanisms, which can struggle with context adaptation and may overlook the unique relevance of words in varied situations. This leads to challenges in accurately analyzing complex sentences containing multiple aspects with differing sentiments. In this work, we present adaptive masking methods that remove irrelevant tokens based on context to assist in Aspect Term Extraction and Aspect Sentiment Classification subtasks of ABSA. We show with our experiments that the proposed methods outperform the baseline methods in terms of accuracy and F1 scores on four benchmark online review datasets. Further, we show that the proposed methods can be extended with multiple adaptations and demonstrate a qualitative analysis of the proposed approach using sample text for aspect term extraction.
面向方面的情感分析(ASSA)是一个细粒度的语言学问题,旨在从给定文本中提取多方面的内容、意见和情感。离散和组合ASSA任务在文献中得到了广泛应用,以研究在线评论和社交媒体帖子中存在的微妙的上下文信息。当前的ASSA方法通常依赖于静态超参数的注意力遮蔽机制,这可能会在上下文适应方面遇到困难,并可能忽视不同情况中单词的独特相关性。这导致在分析复杂句子中多个方面情感存在差异时存在挑战。在本文中,我们提出了适应性遮蔽方法,根据上下文删除无关词,以协助进行ASSA的方面词提取和情感分类子任务。我们通过实验证明,与基线方法相比,所提出的方法在四个基准在线评论数据集上的准确性和F1分数方面具有优势。此外,我们还证明了所提出的方法可以通过多个自定义进行扩展,并且通过使用样本文本进行方面词提取的定性分析展示了所提出方法的有效性。
https://arxiv.org/abs/2402.13722
The ability to generate sentiment-controlled feedback in response to multimodal inputs, comprising both text and images, addresses a critical gap in human-computer interaction by enabling systems to provide empathetic, accurate, and engaging responses. This capability has profound applications in healthcare, marketing, and education. To this end, we construct a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and propose a controllable feedback synthesis system. The proposed system includes an encoder, decoder, and controllability block for textual and visual inputs. It extracts textual and visual features using a transformer and Faster R-CNN networks and combines them to generate feedback. The CMFeed dataset encompasses images, text, reactions to the post, human comments with relevance scores, and reactions to the comments. The reactions to the post and comments are utilized to train the proposed model to produce feedback with a particular (positive or negative) sentiment. A sentiment classification accuracy of 77.23% has been achieved, 18.82% higher than the accuracy without using the controllability. Moreover, the system incorporates a similarity module for assessing feedback relevance through rank-based metrics. It implements an interpretability technique to analyze the contribution of textual and visual features during the generation of uncontrolled and controlled feedback.
能够针对多模态输入生成情感控制反馈,包括文本和图像,解决了一个关键的人机交互缺口,使得系统能够提供体贴、准确、引人入胜的回应。这种能力在医疗、营销和教育等领域具有深刻的应用。为此,我们构建了一个大规模可控制多模态反馈合成(CMFeed)数据集,并提出了一个可控制反馈合成系统。所提出的系统包括编码器、解码器和一个可控制块,用于处理文本和视觉输入。它使用Transformer和Faster R-CNN网络提取文本和视觉特征,并将它们组合生成反馈。CMFeed数据集包括图像、文本、对帖子及其评论的反应、以及对这些评论的反应。用于训练所提出的模型产生具有特定(积极或消极)情感的反馈。情感分类准确度为77.23%,比没有使用控制权的有18.82%的提高。此外,系统还包括一个相似度模块,通过基于排名的指标评估反馈的相关性。它采用了一种解释性技术来分析在生成未控制和控制反馈过程中文本和视觉特征的贡献。
https://arxiv.org/abs/2402.07640
Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. A public challenge to attack the models on CIFAR10 dataset remains undefeated. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We study this question on four popular text classification datasets: IMDB reviews, Yelp reviews, MR sentiment classification, and AG news classification. We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks. We also study a 01 loss sign activation convolutional neural network with a novel global pooling step specific to sign activation networks. With this new variation we see a significant gain in adversarial accuracy rendering TextFooler practically useless against it. We make our code freely available at \url{this https URL} and \url{this https URL}. Our work here suggests that 01 loss sign activation networks could be further developed to create fool proof models against text adversarial attacks.
最近的工作表明,01损失符号激活神经网络对图像分类对抗攻击具有一定的防御能力。然而,在CIFAR10数据集上攻击这些模型 remains 未经挑战。在本研究中,我们问以下问题:01损失符号激活神经网络是否容易被名为TextFooler的流行黑盒文本对抗攻击程序欺骗?我们在四个流行的文本分类数据集上研究这个问题:IMDb评论、Yelp评论、MR情感分类和AG新闻分类。我们发现,与sigmoid激活交叉熵和二进制神经网络相比,我们的01损失符号激活网络在TextFooler上的攻击难度更大。我们还研究了一个新颖的全局池化步长的01损失符号激活卷积神经网络。通过这种新颖的变体,我们看到了显著的增加对抗准确率,使得TextFooler对它几乎没有任何用处。我们的代码目前可以从以下网址免费获取:\url{this https URL} 和 \url{this https URL}。本研究的结果表明,01损失符号激活神经网络可以进一步开发,以创建对文本对抗攻击具有充分保护的模型。
https://arxiv.org/abs/2402.07347
One of the challenges of natural language understanding is to deal with the subjectivity of sentences, which may express opinions and emotions that add layers of complexity and nuance. Sentiment analysis is a field that aims to extract and analyze these subjective elements from text, and it can be applied at different levels of granularity, such as document, paragraph, sentence, or aspect. Aspect-based sentiment analysis is a well-studied topic with many available data sets and models. However, there is no clear definition of what makes a sentence difficult for aspect-based sentiment analysis. In this paper, we explore this question by conducting an experiment with three data sets: "Laptops", "Restaurants", and "MTSC" (Multi-Target-dependent Sentiment Classification), and a merged version of these three datasets. We study the impact of domain diversity and syntactic diversity on difficulty. We use a combination of classifiers to identify the most difficult sentences and analyze their characteristics. We employ two ways of defining sentence difficulty. The first one is binary and labels a sentence as difficult if the classifiers fail to correctly predict the sentiment polarity. The second one is a six-level scale based on how many of the top five best-performing classifiers can correctly predict the sentiment polarity. We also define 9 linguistic features that, combined, aim at estimating the difficulty at sentence level.
自然语言理解的挑战之一是处理句子的主观性,这些句子可能表达意见和情感,增加了复杂性和细微差别。情感分析是一个旨在提取和分析这些主观元素的领域,可以应用于不同的粒度级别,如文档、段落、句子或方面。基于方面的情感分析是一个研究得很好的主题,有很多可用的数据集和模型。然而,对于基于方面情感分析来说,很难给出一个明确的定义什么是句子很难。在本文中,我们通过研究三个数据集:“笔记本电脑”、“餐厅”和“MTSC”(多目标情感分类)以及这三个数据集的合并版本,探讨了这个问题。我们研究了领域多样性和句法多样性的影响。我们使用分类器来识别最困难的句子并分析它们的特征。我们使用两种定义句子难度的方法。第一种是二元的,将句子分类为困难,如果分类器不能正确预测情感极性。第二种是基于五个最佳表现分类器正确预测情感极性的六级水平。我们还定义了9个语言特征,这些特征的组合旨在在句子级别上估计难度。
https://arxiv.org/abs/2402.03163
Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP) tasks take either a single element input or multiple element inputs to predict an output variable, where an element is a block of text. Each text element has two components: an associated semantic meaning and a linguistic realization. Multiple-choice reading comprehension (MCRC) and sentiment classification (SC) are selected to showcase the framework. For MCRC, it is found that the context influence on the output compared to the question influence reduces on more challenging datasets. In particular, more challenging contexts allow a greater variation in complexity of questions. Hence, test creators need to carefully consider the choice of the context when designing multiple-choice questions for assessment. For SC, it is found the semantic meaning of the input text dominates (above 80\% for all datasets considered) compared to its linguistic realisation when determining the sentiment. The framework is made available at: this https URL
理解输出输入的重要性在许多任务中都有用。这项工作提供了一个信息论框架,用于分析文本分类任务中输入的影响。自然语言处理(NLP)任务可以接受单个元素输入或多元素输入,以预测输出变量,其中元素是一个文本块。每个文本元素都有两个组成部分:相关的语义意义和语言实现。选择多项选择阅读理解(MCRC)和情感分类(SC)来展示框架。对于MCRC,研究发现,与问题影响相比,语境对输出影响在更具挑战性的数据集上减少。特别是,更具挑战性的上下文允许更广泛的问题复杂性的变化。因此,测试创建者需要在设计多项选择题时仔细考虑语境的选择。对于SC,研究发现输入文本的语义意义占主导地位(在所有考虑的数据集上均超过80%)。当确定情感时,其语言实现的影响力相对较小。框架可在以下链接中访问:https://this URL
https://arxiv.org/abs/2402.00978
We report results of a longitudinal sentiment classification of Reddit posts written by students of four major Canadian universities. We work with the texts of the posts, concentrating on the years 2020-2023. By finely tuning a sentiment threshold to a range of [-0.075,0.075], we successfully built classifiers proficient in categorizing post sentiments into positive and negative categories. Noticeably, our sentiment classification results are consistent across the four university data sets.
我们报道了来自加拿大四大著名大学的学生在Reddit上撰写的帖子纵向情感分类的结果。我们专注于2020年至2023年的帖子文本。通过将情感阈值微调为[-0.075,0.075]的范围内,我们成功地构建了能够将帖子情感归类为积极和消极类别的分类器。值得注意的是,我们的情感分类结果在四个大学数据集上是一致的。
https://arxiv.org/abs/2401.12382
Analyzing authors' sentiments in texts as a technique for identifying text polarity can be practical and useful in various fields, including medicine and dentistry. Currently, due to factors such as patients' limited knowledge about their condition, difficulties in accessing specialist doctors, or fear of illness, particularly in pandemic conditions, there might be a delay between receiving a radiology report and consulting a doctor. In some cases, this delay can pose significant risks to the patient, making timely decision-making crucial. Having an automatic system that can inform patients about the deterioration of their condition by analyzing the text of radiology reports could greatly impact timely decision-making. In this study, a dataset comprising 1,134 cone-beam computed tomography (CBCT) photo reports was collected from the Shiraz University of Medical Sciences. Each case was examined, and an expert labeled a severity level for the patient's condition on each document. After preprocessing all the text data, a deep learning model based on Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network architecture, known as CNN-LSTM, was developed to detect the severity level of the patient's problem based on sentiment analysis in the radiologist's report. The model's performance was evaluated on two datasets, each with two and four classes, in both imbalanced and balanced scenarios. Finally, to demonstrate the effectiveness of our model, we compared its performance with that of other classification models. The results, along with one-way ANOVA and Tukey's test, indicated that our proposed model (CNN-LSTM) performed the best according to precision, recall, and f-measure criteria. This suggests that it can be a reliable model for estimating the severity of oral and dental diseases, thereby assisting patients.
分析作者在文本中的情感作为一种识别文本极性的技术在医学和口腔领域是实用和有用的。目前,由于患者对自己的疾病了解有限、难以获得专家医生帮助或害怕生病等原因,特别是在疫情条件下,从收到放射学报告到看医生的时间可能会延迟。在某些情况下,这种延迟会对患者造成严重风险,因此及时做出决策至关重要。开发一个自动系统,根据放射学报告的文本内容告知患者病情恶化程度,可以极大地促进及时决策。 在这项研究中,从 Shiraz University of Medical Sciences 收集了1,134个锥束计算机断层扫描(CBCT)照片报告的数据。对每份文件,专家都会对患者的病情严重程度进行标注。经过预处理所有文本数据后,开发了一个基于卷积神经网络(CNN)和长短时记忆(LSTM)网络架构的深度学习模型,称为 CNN-LSTM,以根据放射学报告中的情感分析患者的病情严重程度。对模型的性能进行了评估,在两种不均衡和两种平衡的场景中进行。最后,为了证明我们模型的有效性,我们将其性能与其它分类模型的性能进行了比较。结果加上单因素方差分析和 Tukey 检验,表明我们的提议模型(CNN-LSTM)根据精度、召回率和 F1 值标准,性能最佳。这表明它可以作为一个可靠模型来估计口腔和牙齿疾病严重程度,从而帮助患者。
https://arxiv.org/abs/2401.12993
Instruction-tuned large language models (LLMs) excel at many tasks, and will even provide explanations for their behavior. Since these models are directly accessible to the public, there is a risk that convincing and wrong explanations can lead to unsupported confidence in LLMs. Therefore, interpretability-faithfulness of self-explanations is an important consideration for AI Safety. Assessing the interpretability-faithfulness of these explanations, termed self-explanations, is challenging as the models are too complex for humans to annotate what is a correct explanation. To address this, we propose employing self-consistency checks as a measure of faithfulness. For example, if an LLM says a set of words is important for making a prediction, then it should not be able to make the same prediction without these words. While self-consistency checks are a common approach to faithfulness, they have not previously been applied to LLM's self-explanations. We apply self-consistency checks to three types of self-explanations: counterfactuals, importance measures, and redactions. Our work demonstrate that faithfulness is both task and model dependent, e.g., for sentiment classification, counterfactual explanations are more faithful for Llama2, importance measures for Mistral, and redaction for Falcon 40B. Finally, our findings are robust to prompt-variations.
经过训练的大型语言模型(LLMs)在许多任务上表现出色,甚至可以为他们行为提供解释。由于这些模型对公众直接可用,因此说服力和错误的解释可能导致对LLMs的可靠性产生不支持的观点。因此,在AI安全方面,解释的可信度是一个重要考虑因素。评估这些解释的可信度(称为自我解释)是一个具有挑战性的任务,因为这些模型对于人类来说太过复杂,无法准确标注正确解释。为解决这个问题,我们提出了使用自一致性检查作为可信度的度量。例如,如果一个LLM表示一组单词对于做出预测很重要,那么在没有这些单词的情况下,它应该不能做出相同的预测。尽管自一致性检查是信誉度的常见方法,但之前没有应用于LLM的自我解释。我们对三种类型的自我解释(反例、重要性度量、遮盖)应用自一致性检查。我们的工作证明了信誉度既与任务有关,也与模型有关,例如,对于情感分类,反例解释对Llama2来说更忠实,重要性度量对Mistral来说更准确,遮盖对Falcon 40B来说更准确。最后,我们的研究结果对提示变化具有鲁棒性。
https://arxiv.org/abs/2401.07927
Large Language Models (LLMs) have demonstrated superior abilities in tasks such as chatting, reasoning, and question-answering. However, standard LLMs may ignore crucial paralinguistic information, such as sentiment, emotion, and speaking style, which are essential for achieving natural, human-like spoken conversation, especially when such information is conveyed by acoustic cues. We therefore propose Paralinguistics-enhanced Generative Pretrained Transformer (ParalinGPT), an LLM utilizes text and speech modality to better model the linguistic content and paralinguistic attribute of spoken response. The model takes the conversational context of text, speech embeddings, and paralinguistic attributes as input prompts within a serialized multitasking multi-modal framework. Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning. We utilize the Switchboard-1 corpus, including its sentiment labels to be the paralinguistic attribute, as our spoken dialogue dataset. Experimental results indicate the proposed serialized multitasking method outperforms typical sequence classification techniques on current and response sentiment classification. Furthermore, leveraging conversational context and speech embeddings significantly improves both response text generation and sentiment prediction. Our proposed framework achieves relative improvements of 6.7%, 12.0%, and 3.5% in current sentiment accuracy, response sentiment accuracy, and response text BLEU score, respectively.
大语言模型(LLMs)在诸如聊天、推理和问题回答等任务上表现出了卓越的能力。然而,标准的LLM可能会忽略关键的会话语言信息,例如情感、情绪和口语风格,这些信息对于实现自然、人性化的人际口语对话至关重要,尤其是在这种信息通过语音提示传达时。因此,我们提出了Paralinguistics-enhanced Generative Pretrained Transformer(ParalinGPT),一种LLM利用文本和语音模态更好地建模口语响应的语义内容和会话属性。该模型以文本、语音嵌入和会话属性作为输入提示,在序列多模态框架中进行会话上下文建模。具体来说,我们的框架将任务序列化为当前会话属性预测、响应会话属性预测和响应文本生成与自回归条件。我们利用Switchboard-1语料库,包括其情感标签作为会话属性,作为我们的口语对话数据集。实验结果表明,与典型序列分类技术相比,所提出的序列多任务方法在当前和响应情感分类上表现出色。此外,利用会话上下文和语音嵌入 significantly 改善了响应文本生成和情感预测。我们提出的框架在当前情感准确性、响应情感准确性和响应文本BLEU评分方面分别实现了相对改进6.7%、12.0%和3.5%。
https://arxiv.org/abs/2312.15316
Existing PTLM-based models for TSC can be categorized into two groups: 1) fine-tuning-based models that adopt PTLM as the context encoder; 2) prompting-based models that transfer the classification task to the text/word generation task. In this paper, we present a new perspective of leveraging PTLM for TSC: simultaneously leveraging the merits of both language modeling and explicit target-context interactions via contextual target attributes. Specifically, we design the domain- and target-constrained cloze test, which can leverage the PTLMs' strong language modeling ability to generate the given target's attributes pertaining to the review context. The attributes contain the background and property information of the target, which can help to enrich the semantics of the review context and the target. To exploit the attributes for tackling TSC, we first construct a heterogeneous information graph by treating the attributes as nodes and combining them with (1) the syntax graph automatically produced by the off-the-shelf dependency parser and (2) the semantics graph of the review context, which is derived from the self-attention mechanism. Then we propose a heterogeneous information gated graph convolutional network to model the interactions among the attribute information, the syntactic information, and the contextual information. The experimental results on three benchmark datasets demonstrate the superiority of our model, which achieves new state-of-the-art performance.
现有的基于PTLM的TCS模型可以分为两组:1)基于微调的模型,它们采用PTLM作为上下文编码器;2)基于提示的模型,它们将分类任务转移到文本/单词生成任务。在本文中,我们提出了一个新的利用PTLM的视角:通过上下文目标属性同时利用语言建模和明确的目标上下文交互的优点。具体来说,我们设计了一个域-和目标约束的闭包测试,该测试可以利用PTLM在给定评论上下文生成目标属性的强大语言建模能力。属性包含目标的背景和属性信息,这可以帮助丰富评论上下文和目标的语义。为了应对TC,我们首先通过将属性视为节点,将它们与(1)由普通依赖解析器自动生成的语法图和(2)来自自我注意机制生成的评论语义图合并,构建了一个异质信息图。然后,我们提出了一个异质信息卷积网络来建模属性信息、语义信息以及上下文信息的交互。在三个基准数据集上的实验结果表明,我们的模型具有优越的性能,实现了最新的 state-of-the-art 水平。
https://arxiv.org/abs/2312.13766
Sentiment analysis methods are rapidly being adopted by the field of Urban Design and Planning, for the crowdsourced evaluation of urban environments. However, most models used within this domain are able to identify positive or negative sentiment associated with a textual appraisal as a whole, without inferring information about specific urban aspects contained within it, or the sentiment associated with them. While Aspect Based Sentiment Analysis (ABSA) is becoming increasingly popular, most existing ABSA models are trained on non-urban themes such as restaurants, electronics, consumer goods and the like. This body of research develops an ABSA model capable of extracting urban aspects contained within geo-located textual urban appraisals, along with corresponding aspect sentiment classification. We annotate a dataset of 2500 crowdsourced reviews of public parks, and train a Bidirectional Encoder Representations from Transformers (BERT) model with Local Context Focus (LCF) on this data. Our model achieves significant improvement in prediction accuracy on urban reviews, for both Aspect Term Extraction (ATE) and Aspect Sentiment Classification (ASC) tasks. For demonstrative analysis, positive and negative urban aspects across Boston are spatially visualized. We hope that this model is useful for designers and planners for fine-grained urban sentiment evaluation.
情感分析方法正在迅速成为城市设计领域的热门选择,以便对城市环境进行 crowdsourced 评估。然而,该领域中使用的大多数模型仅能识别文本评价中的积极或消极情感,而无法推断其中包含的具体城市方面的情感或它们所带来的情感。虽然情感基于 aspects 的情感分析(ASSA)变得越来越受欢迎,但大多数现有的 ABSA 模型都是基于非城市主题进行训练的,如餐厅、电子产品、消费品等。这项研究开发了一个能够提取位于地理定位的文本城市评估中的城市方面以及相应方面情感分类的 ABSA 模型。我们在波士顿公共公园的 2500 条 crowdsourced 评论的数据集上进行标注,并使用 Local Context Focus (LCF) 的双向编码器表示从 Transformers (BERT) 模型中进行训练。我们的模型在 Aspect 词提取(ATE)和 Aspect 情感分类(ASC)任务上取得了显著的提高。为了演示分析,我们在波士顿的各个城市方面之间进行了空间可视化。我们希望这个模型对于设计师和规划师进行精细化的城市情感评估有所帮助。
https://arxiv.org/abs/2312.12253
Aspect-based sentiment analysis (ABSA), a fine-grained sentiment classification task, has received much attention recently. Many works investigate sentiment information through opinion words, such as ''good'' and ''bad''. However, implicit sentiment widely exists in the ABSA dataset, which refers to the sentence containing no distinct opinion words but still expresses sentiment to the aspect term. To deal with implicit sentiment, this paper proposes an ABSA method that integrates explicit sentiment augmentations. And we propose an ABSA-specific augmentation method to create such augmentations. Specifically, we post-trains T5 by rule-based data. We employ Syntax Distance Weighting and Unlikelihood Contrastive Regularization in the training procedure to guide the model to generate an explicit sentiment. Meanwhile, we utilize the Constrained Beam Search to ensure the augmentation sentence contains the aspect terms. We test ABSA-ESA on two of the most popular benchmarks of ABSA. The results show that ABSA-ESA outperforms the SOTA baselines on implicit and explicit sentiment accuracy.
aspect-based sentiment分析(ABSA)是一种精细情感分类任务,最近受到了很多关注。许多研究通过意见词,如“好”和“坏”,调查情感信息。然而,在ABSA数据集中,隐含情感普遍存在,这指的是没有明确意见词的句子,但仍然对方面表现出了情感。为了处理隐含情感,本文提出了一种将显性情感增强与ABSA方法集成的方法。我们还提出了一种ABSA特定的增强方法来创建这些增强。具体来说,我们通过基于规则的数据对T5进行后训练。我们在训练过程中使用语义距离加权和小概率差异正则化来引导模型生成明确的情感。同时,我们利用约束 beam search 确保增强句子包含方面词汇。我们在两个ABSA最受欢迎的基准上测试ABSA-ESA。结果表明,ABSA-ESA在隐含和显性情感准确率上超过了现有基线。
https://arxiv.org/abs/2312.10961