Mixup is an effective data augmentation method that generates new augmented samples by aggregating linear combinations of different original samples. However, if there are noises or aberrant features in the original samples, Mixup may propagate them to the augmented samples, leading to over-sensitivity of the model to these outliers . To solve this problem, this paper proposes a new Mixup method called AMPLIFY. This method uses the Attention mechanism of Transformer itself to reduce the influence of noises and aberrant values in the original samples on the prediction results, without increasing additional trainable parameters, and the computational cost is very low, thereby avoiding the problem of high resource consumption in common Mixup methods such as Sentence Mixup . The experimental results show that, under a smaller computational resource cost, AMPLIFY outperforms other Mixup methods in text classification tasks on 7 benchmark datasets, providing new ideas and new ways to further improve the performance of pre-trained models based on the Attention mechanism, such as BERT, ALBERT, RoBERTa, and GPT. Our code can be obtained at this https URL.
混合( Mixup)是一种有效的数据增强方法,通过聚合不同原始样本的线性组合生成新的增强样本。然而,如果原始样本中存在噪声或异常值,混合可能会将这些异常值传播到增强样本中,导致模型对这些异常值过于敏感。为了解决这一问题,本文提出了一种名为AMPLIFY的新混合方法。这种方法使用Transformer自身的注意力机制来减少原始样本中噪声和异常值对预测结果的影响,而无需增加可训练参数,计算成本也非常小,从而避免了常见的混合方法如句子混合(Sentence Mixup)等的高资源消耗问题。实验结果显示,在计算资源更少的情况下,AMPLIFY在7个基准数据集上的文本分类任务中比其他混合方法表现更好,提供了基于注意力机制的预训练模型的注意力增强新想法和新方法,如BERT、ALBERT、RoBERTa和GPT等。我们的代码可以在这个httpsURL上获取。
https://arxiv.org/abs/2309.12689
Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baseline methods on a multi-label text classification task. This is applied to the use case of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification with PLMs is frequently reported to outperform classification with a classification head, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the industry taxonomy; (b) During fine-tuning, multiple labels must be provided in an arbitrary order; (c) The model provides a binary decision for each label, rather than an appropriate confidence score. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head. This improves performance significantly, while also reducing computational costs during inference. The results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities.
Prompt Tuning正在成为一种可扩展且成本效益高的 fine-tuning 预训练语言模型(PLMs)的方法。本研究对 prompt Tuning 和基准方法在多标签文本分类任务中的表现和计算效率进行了基准比较。该应用将公司分类到一家投资公司的行业分类术语,支持其主题投资战略。使用PLMs进行文本到文本分类经常报告比分类头分类表现更好,但在面对每个标签由多个代币组成的多标签分类问题时存在一些限制:(a) 生成的标签可能不与任何标签在行业分类中匹配;(b) 在 fine-tuning 时,必须按任意顺序提供多个标签;(c) 模型为每个标签提供二进制决策,而不是适当的置信度分数。限制 (a) 可以通过使用约束解码(Trie搜索)来解决,这略微提高了分类性能。所有限制 (a)、(b) 和 (c) 都可以通过将语言 head 替换为分类 head 来解决。这显著提高了性能,同时在推理期间减少了计算成本。结果表明,即使在PLMs具有强大泛化能力的时代,仍需要适应特定任务的高级方法。
https://arxiv.org/abs/2309.12075
In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window, which makes it difficult to fit a sufficient number of examples in the prompt. In this paper, we use a pre-trained dense retrieval model to bypass this limitation, giving the model only a partial view of the full label space for each inference call. Testing with recent open-source LLMs (OPT, LLaMA), we set new state of the art performance in few-shot settings for three common intent classification datasets, with no finetuning. We also surpass fine-tuned performance on fine-grained sentiment classification in certain cases. We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL. By running several ablations, we analyze the model's use of: a) the similarity of the in-context examples to the current input, b) the semantic content of the class names, and c) the correct correspondence between examples and labels. We demonstrate that all three are needed to varying degrees depending on the domain, contrary to certain recent works.
上下文学习(ICL)使用大型语言模型进行多项标签任务的挑战是由于上下文窗口有限,这使得很难在提示中装入足够的示例。在本文中,我们使用预训练的密集检索模型绕过了这种限制,为模型只提供了每个推理调用的完整标签空间的 partial 视图。与最近的开源LLM(OPT、LLaMA)进行测试,我们为三种常见的意图分类数据集在少量样本设置中创造了新的顶尖性能,而无需微调。我们还在某些情况下超越了微调性能。我们对上下文中的示例数量和不同模型规模的性能进行分析,表明大型模型是必要的,以便有效地、一致性地利用更大的上下文长度来进行ICL。通过运行几个析因函数,我们分析了模型使用的内容:a) 上下文中的示例与当前输入的相似性,b) 类名称语义内容,以及c) 示例和标签的正确对应关系。我们证明了,这三个方面都是必不可少的,取决于领域,与某些最近的工作相反。
https://arxiv.org/abs/2309.10954
How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning? We propose natural language embedded programs (NLEP) as a unifying framework for addressing math/symbolic reasoning, natural language understanding, and instruction following tasks. Our approach prompts a language model to generate full Python programs that define functions over data structures which contain natural language representations of structured knowledge. A Python interpreter then executes the generated code and prints the output. Despite using a task-general prompt, we find that this approach can improve upon strong baselines across a range of different tasks including math and symbolic reasoning, text classification, question answering, and instruction following. We further find the generated programs are often interpretable and enable post-hoc verification of the intermediate reasoning steps.
我们提议将自然语言嵌入程序(NLEP)作为解决数学/符号推理、自然语言理解和指令跟随任务的统一框架。我们的方案要求语言模型生成完整的Python程序,定义数据结构中的函数,该数据结构包含结构化知识的自然语言表示。Python解释器 then执行生成的代码并打印输出。尽管使用了一般任务prompt,我们发现这种方法可以在包括数学和符号推理、文本分类、问题回答和指令跟随等多种不同任务的基础上提高基准值。我们还发现生成的程序往往可以解释,并能够实现中间推理步骤的事后验证。
https://arxiv.org/abs/2309.10814
We study semantic compression for text where meanings contained in the text are conveyed to a source decoder, e.g., for classification. The main motivator to move to such an approach of recovering the meaning without requiring exact reconstruction is the potential resource savings, both in storage and in conveying the information to another node. Towards this end, we propose semantic quantization and compression approaches for text where we utilize sentence embeddings and the semantic distortion metric to preserve the meaning. Our results demonstrate that the proposed semantic approaches result in substantial (orders of magnitude) savings in the required number of bits for message representation at the expense of very modest accuracy loss compared to the semantic agnostic baseline. We compare the results of proposed approaches and observe that resource savings enabled by semantic quantization can be further amplified by semantic clustering. Importantly, we observe the generalizability of the proposed methodology which produces excellent results on many benchmark text classification datasets with a diverse array of contexts.
我们研究的是如何将文本中包含的意义传递给源解码器,例如用于分类。这种不需要精确重构的方式来恢复意义的主要动机是潜在的资源节省,包括存储和将信息传递给另一个节点的信息传输成本。为此,我们提出了语义量化和压缩文本的方法,利用句子嵌入和语义失真度量来保留意义。我们的结果显示,所提出的语义方法在表示消息所需的比特数方面取得了显著节省(数量级),但相对于语义无偏见基线,精度损失非常小。我们比较了所提出的方法的结果,并观察到通过语义量化实现的资源节省可以通过语义簇集进一步放大。重要的是,我们观察了所提出的方法的通用性,该方法在许多基准文本分类数据集上取得了出色的结果,具有多样化的上下文。
https://arxiv.org/abs/2309.10809
One of the most popular downstream tasks in the field of Natural Language Processing is text classification. Text classification tasks have become more daunting when the texts are code-mixed. Though they are not exposed to such text during pre-training, different BERT models have demonstrated success in tackling Code-Mixed NLP challenges. Again, in order to enhance their performance, Code-Mixed NLP models have depended on combining synthetic data with real-world data. It is crucial to understand how the BERT models' performance is impacted when they are pretrained using corresponding code-mixed languages. In this paper, we introduce Tri-Distil-BERT, a multilingual model pre-trained on Bangla, English, and Hindi, and Mixed-Distil-BERT, a model fine-tuned on code-mixed data. Both models are evaluated across multiple NLP tasks and demonstrate competitive performance against larger models like mBERT and XLM-R. Our two-tiered pre-training approach offers efficient alternatives for multilingual and code-mixed language understanding, contributing to advancements in the field.
自然语言处理领域的一个最受欢迎的后续任务是文本分类。当文本被编码混合时,文本分类任务变得更加具有挑战性。尽管在预训练期间,不同 BERT 模型并没有接触到这样的文本,但不同 BERT 模型都成功地解决了编码混合 NLP 挑战。再次,为了增强其性能,编码混合 NLP 模型依赖于将合成数据和现实世界数据相结合。理解 BERT 模型在使用相应编码混合语言预训练时的性能是至关重要的。在本文中,我们介绍了 Tri-Distil-BERT 和 Mixed-Distil-BERT,是两个基于编码混合数据的多语言模型。这两个模型在不同 NLP 任务中进行评估,并相对于像 mBERT 和 XLM-R 这样的大型模型表现出竞争力性能。我们的两层预训练方法为多语言和编码混合语言理解提供了高效的替代方案,推动了该领域的进步。
https://arxiv.org/abs/2309.10272
User-generated texts available on the web and social platforms are often long and semantically challenging, making them difficult to annotate. Obtaining human annotation becomes increasingly difficult as problem domains become more specialized. For example, many health NLP problems require domain experts to be a part of the annotation pipeline. Thus, it is crucial that we develop low-resource NLP solutions able to work with this set of limited-data problems. In this study, we employ Abstract Meaning Representation (AMR) graphs as a means to model low-resource Health NLP tasks sourced from various online health resources and communities. AMRs are well suited to model online health texts as they can represent multi-sentence inputs, abstract away from complex terminology, and model long-distance relationships between co-referring tokens. AMRs thus improve the ability of pre-trained language models to reason about high-complexity texts. Our experiments show that we can improve performance on 6 low-resource health NLP tasks by augmenting text embeddings with semantic graph embeddings. Our approach is task agnostic and easy to merge into any standard text classification pipeline. We experimentally validate that AMRs are useful in the modeling of complex texts by analyzing performance through the lens of two textual complexity measures: the Flesch Kincaid Reading Level and Syntactic Complexity. Our error analysis shows that AMR-infused language models perform better on complex texts and generally show less predictive variance in the presence of changing complexity.
用户在Web和社交媒体平台上生成的文本通常很长且语义挑战性,这使得它们很难进行标注。获得人类标注变得越来越困难,因为问题领域变得越来越专业化。例如,许多健康NLP问题需要领域专家成为标注流水线的一部分。因此,开发能够与这些有限数据问题使用的低资源NLP解决方案至关重要。在本研究中,我们使用Abstract Meaning Representation(AMR)图形来建模从各种在线健康资源和社区收集的低资源健康NLP任务。AMR非常适合建模在线健康文本,它们可以代表多句话输入,从复杂的术语抽象出来,并建模共同引用的代币之间的远距离关系。AMR因此改善了训练过的语言模型对高复杂性文本进行推理的能力。我们的实验结果表明,通过增加文本嵌入和语义图形嵌入,我们可以提高6个低资源健康NLP任务的性能。我们的方法不依赖任务,很容易与任何标准的文本分类流水线融合。我们通过使用 Flesch-Kincaid阅读水平和语法复杂性两个文本复杂性指标来实验验证AMR在复杂文本建模中的作用。我们的错误分析表明,AMR融入的语言模型在复杂文本上表现更好,并且在存在变化的复杂性情况下通常表现出较少预测变异。
https://arxiv.org/abs/2309.09877
Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current research addresses this bottleneck by exploring a novel paradigm called zero-shot learning via dataset generation. Here, a powerful LLM is prompted with a task description to generate labeled data that can be used to train a downstream NLP model. For instance, an LLM might be prompted to "generate 500 movie reviews with positive overall sentiment, and another 500 with negative sentiment." The generated data could then be used to train a binary sentiment classifier, effectively leveraging an LLM as a teacher to a smaller student model. With this demo, we introduce Fabricator, an open-source Python toolkit for dataset generation. Fabricator implements common dataset generation workflows, supports a wide range of downstream NLP tasks (such as text classification, question answering, and entity recognition), and is integrated with well-known libraries to facilitate quick experimentation. With Fabricator, we aim to support researchers in conducting reproducible dataset generation experiments using LLMs and help practitioners apply this approach to train models for downstream tasks.
大部分自然语言处理任务都被建模为监督学习,因此需要标记的训练数据来训练有效的模型。然而,手动生产高质量的和大量的此类数据被认为是昂贵的和时间密集型的。当前的研究通过探索被称为零样本学习的新型范式来解决这个问题,该范式通过数据生成来引导强大的LLM生成标记数据,这些数据可以用于训练后续的自然语言处理模型。例如,LLM可能会被引导生成“产生500篇正面情感的总体评价的文章,并生成500篇负面情感的文章。”生成的数据可用于训练二进制情感分类器,有效地将LLM作为学生模型的指导教师。通过这个演示,我们介绍了 fabricator,一个开源的Python工具集,用于数据生成。 fabricator实现了常见的数据生成工作流程,支持广泛的后续自然语言处理任务(如文本分类、问题回答和实体识别),并与其他知名库集成,以方便快速实验。通过 fabricator,我们旨在支持使用LLMs进行可重复性数据生成实验的研究人员,并帮助实践者应用这种方法来训练模型为后续任务。
https://arxiv.org/abs/2309.09582
In this paper we present the first investigation into the effectiveness of Large Language Models (LLMs) for Failure Mode Classification (FMC). FMC, the task of automatically labelling an observation with a corresponding failure mode code, is a critical task in the maintenance domain as it reduces the need for reliability engineers to spend their time manually analysing work orders. We detail our approach to prompt engineering to enable an LLM to predict the failure mode of a given observation using a restricted code list. We demonstrate that the performance of a GPT-3.5 model (F1=0.80) fine-tuned on annotated data is a significant improvement over a currently available text classification model (F1=0.60) trained on the same annotated data set. The fine-tuned model also outperforms the out-of-the box GPT-3.5 (F1=0.46). This investigation reinforces the need for high quality fine-tuning data sets for domain-specific tasks using LLMs.
在本文中,我们介绍了对大型语言模型(LLM)在故障模式分类(FMC)方面的有效性的第一次研究。FMC任务是自动为观察自动标签并使用相应的故障模式代码。在维护领域,这是一个重要的任务,因为它减少了可靠性工程师需要花费的时间手动分析工作订单的必要性。我们详细描述了我们催促工程的方法,以使LLM使用受限代码列表预测给定观察的故障模式,并证明了使用标注数据训练的GPT-3.5模型(F1=0.80)相对于当前可用的文本分类模型(F1=0.60)在同样标注数据集上的训练表现的重大改进。训练模型还比未优化的GPT-3.5模型(F1=0.46)表现更好。这个研究进一步加强了使用LLM进行特定领域的任务所需的高质量微调数据集的必要性。
https://arxiv.org/abs/2309.08181
In-context learning (ICL) i.e. showing LLMs only a few task-specific demonstrations has led to downstream gains with no task-specific fine-tuning required. However, LLMs are sensitive to the choice of prompts, and therefore a crucial research question is how to select good demonstrations for ICL. One effective strategy is leveraging semantic similarity between the ICL demonstrations and test inputs by using a text retriever, which however is sub-optimal as that does not consider the LLM's existing knowledge about that task. From prior work (Min et al., 2022), we already know that labels paired with the demonstrations bias the model predictions. This leads us to our hypothesis whether considering LLM's existing knowledge about the task, especially with respect to the output label space can help in a better demonstration selection strategy. Through extensive experimentation on three text classification tasks, we find that it is beneficial to not only choose semantically similar ICL demonstrations but also to choose those demonstrations that help resolve the inherent label ambiguity surrounding the test example. Interestingly, we find that including demonstrations that the LLM previously mis-classified and also fall on the test example's decision boundary, brings the most performance gain.
在上下文学习中(ICL),即只展示LLMs少量的任务特定演示,却不需要任务特定微调,已经取得了后续收益,而无需针对特定微调。然而,LLMs对提示的选择很敏感,因此一个关键研究问题是如何为ICL选择好的演示。一种有效的策略是利用ICL演示和测试输入之间的语义相似性,使用文本检索器,然而这虽然最优,但考虑到LLM对该任务的旧知识,在输出标签空间中可以帮助更好的演示选择策略。从先前的工作(Min等,2022)中,我们已经知道,与演示配对的标签会偏移模型预测。这导致我们的假设是,考虑LLM对该任务的旧知识,特别是在输出标签空间中,可以帮助更好的演示选择策略。通过在三个文本分类任务上进行广泛的实验,我们发现,不仅选择语义相似的ICL演示,而且选择那些帮助解决测试例子周围的潜在标签歧义的演示,是有益的。有趣的是,我们发现包括以前误分类的演示,并落在测试例子的决策边界上,带来了最大的性能收益。
https://arxiv.org/abs/2309.07900
Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.
自动识别患者是否符合临床试验的资格问题是由于试验资格在自然语言中声明而复杂的。解决这个问题的一个潜在方法是使用文本分类方法适用于常见的资格标准。在本研究中,我们关注七个在癌症临床试验中常见的排除标准:先前恶性肿瘤、艾滋病病毒、乙肝、乙肝、精神病、药物/药物滥用和自身免疫疾病。我们的数据集包括764个第三阶段的癌症试验,这些排除标准在试验级别上注释。我们使用常见的Transformer模型和新训练的临床试验BERT模型进行实验。我们的结果表明,自动分类常见的排除标准是可行的。此外,我们还证明了专门为临床试验训练的语言模型的价值,该模型在所有标准上表现出最高的平均性能。
https://arxiv.org/abs/2309.07812
Large Language Models (LLMs) have shown impressive performance across a variety of Artificial Intelligence (AI) and natural language processing tasks, such as content creation, report generation, etc. However, unregulated malign application of these models can create undesirable consequences such as generation of fake news, plagiarism, etc. As a result, accurate detection of AI-generated language can be crucial in responsible usage of LLMs. In this work, we explore 1) whether a certain body of text is AI generated or written by human, and 2) attribution of a specific language model in generating a body of text. Texts in both English and Spanish are considered. The datasets used in this study are provided as part of the Automated Text Identification (AuTexTification) shared task. For each of the research objectives stated above, we propose an ensemble neural model that generates probabilities from different pre-trained LLMs which are used as features to a Traditional Machine Learning (TML) classifier following it. For the first task of distinguishing between AI and human generated text, our model ranked in fifth and thirteenth place (with macro $F1$ scores of 0.733 and 0.649) for English and Spanish texts, respectively. For the second task on model attribution, our model ranked in first place with macro $F1$ scores of 0.625 and 0.653 for English and Spanish texts, respectively.
大型语言模型(LLM)在多种人工智能和自然语言处理任务中表现出了令人印象深刻的表现,例如内容创建、报告生成等。然而,未经监管的恶意使用这些模型可能会导致不良的后果,例如生成假新闻、抄袭等。因此,在负责任地使用LLM方面,准确检测AI生成的语言是至关重要的。在本研究中,我们探讨了以下问题:1)某段文本是否由AI生成或由人类撰写;2)特定语言模型在生成文本中的归属。考虑了英语和西班牙语文本。本研究使用的dataset是自动文本识别(AuTexTification)共享任务的一部分。对于以上每个研究目标,我们提出了一个集成神经网络模型,从不同的预训练LLM中生成概率,并将其作为传统机器学习(TML)分类器的特征。对于第一个任务,区分AI和人类生成文本,我们的模型在英语和西班牙语文本中分别排名第五和第十三。对于第二个任务,关于模型归属,我们的模型在英语和西班牙语文本中分别排名第一是。
https://arxiv.org/abs/2309.07755
Social media platforms play an essential role in crisis communication, but analyzing crisis-related social media texts is challenging due to their informal nature. Transformer-based pre-trained models like BERT and RoBERTa have shown success in various NLP tasks, but they are not tailored for crisis-related texts. Furthermore, general-purpose sentence encoders are used to generate sentence embeddings, regardless of the textual complexities in crisis-related texts. Advances in applications like text classification, semantic search, and clustering contribute to effective processing of crisis-related texts, which is essential for emergency responders to gain a comprehensive view of a crisis event, whether historical or real-time. To address these gaps in crisis informatics literature, this study introduces CrisisTransformers, an ensemble of pre-trained language models and sentence encoders trained on an extensive corpus of over 15 billion word tokens from tweets associated with more than 30 crisis events, including disease outbreaks, natural disasters, conflicts, and other critical incidents. We evaluate existing models and CrisisTransformers on 18 crisis-specific public datasets. Our pre-trained models outperform strong baselines across all datasets in classification tasks, and our best-performing sentence encoder improves the state-of-the-art by 17.43% in sentence encoding tasks. Additionally, we investigate the impact of model initialization on convergence and evaluate the significance of domain-specific models in generating semantically meaningful sentence embeddings. All models are publicly released (this https URL), with the anticipation that they will serve as a robust baseline for tasks involving the analysis of crisis-related social media texts.
社交媒体平台在危机沟通中发挥着重要作用,但由于其非正式性质,分析危机相关的社交媒体文本具有挑战性。基于Transformer的前馈训练模型,如BERT和RoBERTa,在多种自然语言处理任务中表现出了成功,但它们并不是专门设计用于危机相关的文本。此外,通用的 sentence encoder 被用于生成 sentence embeddings,无论危机相关的文本中文本的复杂性如何。例如,分类、语义搜索和聚类等应用的进步促进了对危机相关的文本的有效处理,这对应急服务人员来说,无论是历史还是实时的危机事件,都是至关重要的。为了弥补危机信息学文献中的这些差距,本研究介绍了危机Transformers,是一个由训练在超过1.5亿个单词 token 的推特文本集中的广泛语料库中的超过30个危机事件相关的微博的数十个特定公共数据集组成的集成。我们对现有的模型和危机Transformers 进行了评估。我们的前馈模型在所有数据集上在所有任务中优于强大的基准模型,我们的最好 performing sentence encoder 在 sentence 编码任务中提高了现有技术水平17.43%。此外,我们探讨了模型初始化对收敛的影响,并评估了特定领域模型在生成语义有意义的 sentence embeddings 方面的 significance。所有模型都公开发布(此 https URL),并预计它们将成为涉及分析危机相关的社交媒体文本的任务的稳健基准。
https://arxiv.org/abs/2309.05494
Meta learning have achieved promising performance in low-resource text classification which aims to identify target classes with knowledge transferred from source classes with sets of small tasks named episodes. However, due to the limited training data in the meta-learning scenario and the inherent properties of parameterized neural networks, poor generalization performance has become a pressing problem that needs to be addressed. To deal with this issue, we propose a meta-learning based method called Retrieval-Augmented Meta Learning(RAML). It not only uses parameterization for inference but also retrieves non-parametric knowledge from an external corpus to make inferences, which greatly alleviates the problem of poor generalization performance caused by the lack of diverse training data in meta-learning. This method differs from previous models that solely rely on parameters, as it explicitly emphasizes the importance of non-parametric knowledge, aiming to strike a balance between parameterized neural networks and non-parametric knowledge. The model is required to determine which knowledge to access and utilize during inference. Additionally, our multi-view passages fusion network module can effectively and efficiently integrate the retrieved information into low-resource classification task. The extensive experiments demonstrate that RAML significantly outperforms current SOTA low-resource text classification models.
在低资源文本分类中,meta learning取得了令人瞩目的表现,该方法旨在识别从源类转移知识的目标类,并使用一系列小型任务被称为事件集。然而,由于meta learning场景的训练数据有限,以及参数化的神经网络固有的特性, poor generalization performance已成为一个紧迫的问题,需要解决。为了解决这一问题,我们提出了一种名为Retrieval-Augmented Meta Learning(RAML)的方法。它不仅使用参数化推理,还从外部语料库中检索非参数知识,以进行推理,从而极大地减轻了meta learning中缺乏多样性训练数据所导致的 poor generalization performance问题。这种方法与之前只依赖参数的传统模型不同,因为它明确地强调了非参数知识的重要性,旨在平衡参数化的神经网络和非参数知识。模型需要在推理期间确定哪些知识可以访问和利用。此外,我们的多视角通道融合网络模块能够有效地高效地将检索信息整合到低资源分类任务中。广泛的实验表明,RAML显著优于当前低资源文本分类领域的最佳模型。
https://arxiv.org/abs/2309.04979
Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the lack of interpretability and explainability. In this paper, we propose to combine the two approaches to perform ERC, as a means to obtain simpler and more interpretable Large Language Models-based classifiers. We propose to feed the utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations, that are then supplied to an adapted Fuzzy Fingerprint classification module. We validate our approach on the widely used DailyDialog ERC benchmark dataset, in which we obtain state-of-the-art level results using a much lighter model.
模糊指纹作为一种可解释的文本分类技术被成功使用,但与大多数其他技术一样,其表现已被大型预训练语言模型,如BERT或RoBERTa大幅超越。这些模型在多个自然语言处理任务中取得了最先进的结果,其中包括对话中情感识别(ERC),但缺乏可解释性和可读性。在本文中,我们提议将两个方法结合起来来实现ERC,以获得更简单且更可解释的大型语言模型based分类器。我们提议将发言和之前的对话回合向预训练的RoBERTa进行训练,获得上下文嵌入发言表示,然后将这些表示供给适应的模糊指纹分类模块。我们使用了广泛使用的每日对话ERC基准数据集进行验证,在该数据集中使用了一种更轻量级模型取得了最先进的结果。
https://arxiv.org/abs/2309.04292
The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structuralize text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Especially for IE, if the target information is not predefined in the ontology of the IE system, one needs to build their own system. Here we provide NESTLE, a no code tool for large-scale statistical analysis of legal corpus. With NESTLE, users can search target documents, extract information, and visualize the structured data all via the chat interface with accompanying auxiliary GUI for the fine-level control. NESTLE consists of three main components: a search engine, an end-to-end IE system, and a Large Language Model (LLM) that glues the whole components together and provides the chat interface. Powered by LLM and the end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. The use of the custom end-to-end IE system also enables faster and low-cost IE on large scale corpus. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LEXGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples. The detailed analysis provides the insight on the trade-off between accuracy, time, and cost in building such system.
对大规模法律文本的统计分析可以提供宝贵的法律洞察力。要进行这样的分析,需要(1)使用文档检索工具选择文本的特定子集,(2)使用信息提取(IE)系统结构文本,(3)可视化数据分析的数据。每个过程都需要专门的工具或编程技能,而目前还没有全面的“无代码”工具可用。特别是对于IE,如果目标信息在IE系统的本体规划中未被预先定义,就需要建立自己的系统。在这里,我们提供了NESTLE,一个无代码工具,用于大规模法律文本的统计分析。通过使用NESTLE,用户可以通过聊天界面搜索目标文档、提取信息并可视化结构化数据,同时使用辅助GUI进行高精度控制。NESTLE由三个主要组件组成:一个搜索引擎、一个端到端IE系统,以及一个大型语言模型(LLM),它将整个组件拼接在一起并提供聊天界面。通过使用LLM和端到端IE系统,NESTLE可以提取任何类型的信息,在IE系统中未被预先定义,从而打开无限制自定义数据分析的可能性,而无需编写一行代码。使用自定义的端到端IE系统还使大规模法律文本的IE分析更快、成本更低。我们从LEXGLUE中选择了15个韩国以前的IE任务和3个法律文本分类任务,并对我们的系统进行了验证。综合实验表明,通过使用人类标签和LLM标签示例训练内部IE模块,NESTLE可以实现与GPT-4相当的性能,而详细的分析提供了在构建此类系统时准确性、时间和成本之间的权衡。
https://arxiv.org/abs/2309.04146
Sentiment analysis is a pivotal task in the domain of natural language processing. It encompasses both text-level sentiment polarity classification and word-level Part of Speech(POS) sentiment polarity determination. Such analysis challenges models to understand text holistically while also extracting nuanced information. With the rise of Large Language Models(LLMs), new avenues for sentiment analysis have opened. This paper proposes enhancing performance by leveraging the Mutual Reinforcement Effect(MRE) between individual words and the overall text. It delves into how word polarity influences the overarching sentiment of a passage. To support our research, we annotated four novel Sentiment Text Classification and Part of Speech(SCPOS) datasets, building upon existing sentiment classification datasets. Furthermore, we developed a Universal Sentiment Analysis(USA) model, with a 7-billion parameter size. Experimental results revealed that our model surpassed the performance of gpt-3.5-turbo across all four datasets, underscoring the significance of MRE in sentiment analysis.
情感分析是自然语言处理领域的关键任务,涵盖了文本级别的情感极性分类和单词级别的部分语用(POS)情感极性 determination。这种分析挑战模型既要理解文本的整体情况,又要提取微妙的信息。随着大型语言模型(LLMs)的崛起,情感分析也开辟了新的途径。本文提出通过利用个体单词和整个文本之间的互相强化效应来增强性能。它探讨了单词极性如何影响一段文本的总体情感。为了支持我们的研究,我们注释了四个全新的情感文本分类和部分语用(SCPOS)数据集,建立在现有的情感分类数据集上。此外,我们开发了一台通用的情感分析模型,参数规模为70亿。实验结果显示,我们的模型在所有四个数据集上超过了GPT-3.5-Turbo的性能,强调了在情感分析中MRE的重要性。
https://arxiv.org/abs/2309.03787
Social media processing is a fundamental task in natural language processing with numerous applications. As Vietnamese social media and information science have grown rapidly, the necessity of information-based mining on Vietnamese social media has become crucial. However, state-of-the-art research faces several significant drawbacks, including imbalanced data and noisy data on social media platforms. Imbalanced and noisy are two essential issues that need to be addressed in Vietnamese social media texts. Graph Convolutional Networks can address the problems of imbalanced and noisy data in text classification on social media by taking advantage of the graph structure of the data. This study presents a novel approach based on contextualized language model (PhoBERT) and graph-based method (Graph Convolutional Networks). In particular, the proposed approach, ViCGCN, jointly trained the power of Contextualized embeddings with the ability of Graph Convolutional Networks, GCN, to capture more syntactic and semantic dependencies to address those drawbacks. Extensive experiments on various Vietnamese benchmark datasets were conducted to verify our approach. The observation shows that applying GCN to BERTology models as the final layer significantly improves performance. Moreover, the experiments demonstrate that ViCGCN outperforms 13 powerful baseline models, including BERTology models, fusion BERTology and GCN models, other baselines, and SOTA on three benchmark social media datasets. Our proposed ViCGCN approach demonstrates a significant improvement of up to 6.21%, 4.61%, and 2.63% over the best Contextualized Language Models, including multilingual and monolingual, on three benchmark datasets, UIT-VSMEC, UIT-ViCTSD, and UIT-VSFC, respectively. Additionally, our integrated model ViCGCN achieves the best performance compared to other BERTology integrated with GCN models.
社交媒体处理是自然语言处理中具有广泛应用的基本概念任务。随着越南社交媒体和信息科学的快速发展,基于社交媒体的信息挖掘已成为至关重要的任务。然而,先进的研究面临着 several 重要的缺点,包括社交媒体平台上数据不平衡和噪声的问题。不平衡和噪声是必须在越南社交媒体文本中处理的两种基本问题。Graph Convolutional Networks可以利用数据 Graph 结构的优势,来解决社交媒体文本分类中数据不平衡和噪声的问题。本研究提出了一种基于上下文模型(PhoBERT)和 graph-based 方法(Graph Convolutional Networks)的新方法,特别是提出了 ViCGCN,它联合训练上下文嵌入力与 Graph Convolutional Networks 的能力,以捕捉更多的语法和语义依赖,以解决这些缺点。对多个越南基准数据集进行了广泛的实验以验证我们的方法。观察表明,将 GCN 应用于 BERTology 模型作为最终层显著改善了性能。此外,实验表明 ViCGCN 在三个基准社交媒体数据集上比13个强大的基线模型(包括 BERTology 模型、融合 BERTology 和 GCN 模型、其他基线模型和 SOTA)表现更好。我们提出的 ViCGCN 方法表现出与其他 BERTology 与 GCN 模型整合的方法相比最佳的性能。此外,我们的整合模型 ViCGCN 在与其他 BERTology 与 GCN 模型整合的 BERTology 模型的比较中取得了最佳性能。
https://arxiv.org/abs/2309.02902
Keeping up with research and finding related work is still a time-consuming task for academics. Researchers sift through thousands of studies to identify a few relevant ones. Automation techniques can help by increasing the efficiency and effectiveness of this task. To this end, we developed CRUISE-Screening, a web-based application for conducting living literature reviews - a type of literature review that is continuously updated to reflect the latest research in a particular field. CRUISE-Screening is connected to several search engines via an API, which allows for updating the search results periodically. Moreover, it can facilitate the process of screening for relevant publications by using text classification and question answering models. CRUISE-Screening can be used both by researchers conducting literature reviews and by those working on automating the citation screening process to validate their algorithms. The application is open-source: this https URL, and a demo is available under this URL: this https URL. We discuss the limitations of our tool in Appendix A.
跟踪研究和找到相关工作仍然是学术界一个耗时的任务。研究人员筛选数千篇文章来找到几个相关的。自动化技术可以帮助提高这个任务的效率和有效性。为此,我们开发了CRUISE-SCREENing,一个用于进行 living literature reviews 的 Web 应用程序 - 一种不断更新以反映特定领域的最新研究的 literature review。CRUISE-SCREENing 通过 API 与多个搜索引擎连接,可以定期更新搜索结果。此外,它可以通过文本分类和问答模型方便地筛选相关的出版物。CRUISE-SCREENing 可以用于进行 literature reviews 的研究人员,同时也可用于自动化引用筛选过程的研究人员,以验证他们的算法。该应用是开源的:这个 https URL,并且有一个演示可用在这个 https URL 下。我们在附录A中讨论了我们的工具的限制。
https://arxiv.org/abs/2309.01684
With the ever-increasing potential of AI to perform personalised tasks, it is becoming essential to develop new machine learning techniques which are data-efficient and do not require hundreds or thousands of training data. In this paper, we explore an Inductive Logic Programming approach for one-shot text classification. In particular, we explore the framework of Meta-Interpretive Learning (MIL), along with using common-sense background knowledge extracted from ConceptNet. Results indicate that MIL can learn text classification rules from a small number of training examples. Moreover, the higher complexity of chosen examples, the higher accuracy of the outcome.
随着人工智能执行个性化任务的潜力的不断增大,开发高效且不需要数百或数千个训练数据的新的机器学习技术变得越来越重要。在本文中,我们探讨了一次文本分类的启发式逻辑编程方法。特别地,我们还探讨了 Meta-Interpretive Learning (MIL) 框架,并使用从概念Net中获取的常识背景知识。结果表明,MIL可以从少量训练示例中学习文本分类规则。此外,选择的复杂度更高的示例可以提高结果的准确率。
https://arxiv.org/abs/2308.15885