Few-Shot Class-Incremental Learning presents an extension of the Class Incremental Learning problem where a model is faced with the problem of data scarcity while addressing the catastrophic forgetting problem. This problem remains an open problem because all recent works are built upon the convolutional neural networks performing sub-optimally compared to the transformer approaches. Our paper presents Robust Transformer Approach built upon the Compact Convolution Transformer. The issue of overfitting due to few samples is overcome with the notion of the stochastic classifier, where the classifier's weights are sampled from a distribution with mean and variance vectors, thus increasing the likelihood of correct classifications, and the batch-norm layer to stabilize the training process. The issue of CF is dealt with the idea of delta parameters, small task-specific trainable parameters while keeping the backbone networks frozen. A non-parametric approach is developed to infer the delta parameters for the model's predictions. The prototype rectification approach is applied to avoid biased prototype calculations due to the issue of data scarcity. The advantage of ROBUSTA is demonstrated through a series of experiments in the benchmark problems where it is capable of outperforming prior arts with big margins without any data augmentation protocols.
少样本分类增量学习扩展了分类增量学习问题,其中模型在解决数据稀缺的同时还要应对灾难性遗忘问题。这个问题仍然是一个开放问题,因为所有最近的工作都是基于在训练过程中表现不佳的卷积神经网络的。我们的论文提出了基于Compact卷积转置的鲁棒Transformer方法。通过随机分类器的概念,我们克服了由于样本数量较少而导致的过拟合问题。此外,通过使用批归一化层来稳定训练过程,解决了分类器基线网络的冻结问题。为了处理分类器中的delta参数问题,我们提出了一种非参数方法来推断模型的预测中的delta参数。原型纠正方法应用于避免由于数据稀缺而导致的偏差原型计算问题。通过在基准问题系列实验中证明ROBUSTA的优势,无需数据增强方案即可在大幅度的性能提升中超越前人水平。
https://arxiv.org/abs/2405.05984
Large Language Models (LLMs) have made great strides in areas such as language processing and computer vision. Despite the emergence of diverse techniques to improve few-shot learning capacity, current LLMs fall short in handling the languages in biology and chemistry. For example, they are struggling to capture the relationship between molecule structure and pharmacochemical properties. Consequently, the few-shot learning capacity of small-molecule drug modification remains impeded. In this work, we introduced DrugLLM, a LLM tailored for drug design. During the training process, we employed Group-based Molecular Representation (GMR) to represent molecules, arranging them in sequences that reflect modifications aimed at enhancing specific molecular properties. DrugLLM learns how to modify molecules in drug discovery by predicting the next molecule based on past modifications. Extensive computational experiments demonstrate that DrugLLM can generate new molecules with expected properties based on limited examples, presenting a powerful few-shot molecule generation capacity.
大规模语言模型(LLMs)在诸如自然语言处理和计算机视觉等领域取得了重大进展。尽管已经出现了许多方法来提高少样本学习能力,但目前的LLM在处理生物学和化学领域的语言方面仍然存在不足。例如,它们难以捕捉分子结构和药理学特性之间的联系。因此,小分子药物修饰的少样本学习能力仍然受到阻碍。在这项工作中,我们引入了DrugLLM,一种专为药物设计而设计的LLM。在训练过程中,我们采用基于组的分子表示方法(GMR)来表示分子,并将它们安排成反映旨在增强特定分子特性的修改序列中。通过预测下一个分子,DrugLLM可以学习如何修改药物发现中的分子。大量的计算实验证明,DrugLLM可以根据有限的示例生成具有预期特性的新分子,展示了强大的几样本分子生成能力。
https://arxiv.org/abs/2405.06690
Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes to unseen classes both difficult and inefficient. (2) Rare labeled novel samples usually cannot provide enough supervision signals to enable the model to adjust from the source distribution to the target distribution, especially for complicated scenarios. To alleviate the above issues, we propose a simple and effective strategy for few-shot and zero-shot text classification. We aim to liberate the model from the confines of seen classes, thereby enabling it to predict unseen categories without the necessity of training on seen classes. Specifically, for mining more related unseen category knowledge, we utilize a large pre-trained language model to generate pseudo novel samples, and select the most representative ones as category anchors. After that, we convert the multi-class classification task into a binary classification task and use the similarities of query-anchor pairs for prediction to fully leverage the limited supervision signals. Extensive experiments on six widely used public datasets show that our proposed method can outperform other strong baselines significantly in few-shot and zero-shot tasks, even without using any seen class samples.
少样本和零样本文本分类旨在识别具有有限标注样本或完全没有标注样本的新兴类别。虽然预先存在的方法通过将可见类别的知识传递到未见类别来展示有希望的性能,但它们仍然受到以下限制:(1)类之间的固有差异使得将可见类别的特征从可见类转移到未见类的过程既困难又低效。(2)稀有标注的新兴样本通常无法提供足够的指导信号,使模型能够从源分布调整到目标分布,尤其是在复杂场景中。为了减轻上述问题,我们提出了一个简单而有效的少样本和零样本文本分类策略。我们的目标是解放模型的限制,从而使其能够预测未见类别,而无需在可见类别上进行训练。具体来说,为了挖掘更多相关的未见类别知识,我们利用一个大的预训练语言模型生成伪新样本,并选择最具代表性的作为类别锚点。然后,我们将多分类分类任务转换为二分类任务,并利用查询-锚点之间的相似性进行预测,充分利用有限的监督信号。在六个广泛使用的大数据集上进行广泛的实验证明,与使用任何可见类别的样本相比,我们提出的方法在少样本和零样本任务上可以显著提高性能。
https://arxiv.org/abs/2405.03565
Effective image classification hinges on discerning relevant features from both foreground and background elements, with the foreground typically holding the critical information. While humans adeptly classify images with limited exposure, artificial neural networks often struggle with feature selection from rare samples. To address this challenge, we propose a novel method for selecting class-relevant patch embeddings. Our approach involves splitting support and query images into patches, encoding them using a pre-trained Vision Transformer (ViT) to obtain class embeddings and patch embeddings, respectively. Subsequently, we filter patch embeddings using class embeddings to retain only the class-relevant ones. For each image, we calculate the similarity between class embedding and each patch embedding, sort the similarity sequence in descending order, and only retain top-ranked patch embeddings. By prioritizing similarity between the class embedding and patch embeddings, we select top-ranked patch embeddings to be fused with class embedding to form a comprehensive image representation, enhancing pattern recognition across instances. Our strategy effectively mitigates the impact of class-irrelevant patch embeddings, yielding improved performance in pre-trained models. Extensive experiments on popular few-shot classification benchmarks demonstrate the simplicity, efficacy, and computational efficiency of our approach, outperforming state-of-the-art baselines under both 5-shot and 1-shot scenarios.
有效的图像分类依赖于从前景和背景元素中辨别相关特征,通常前景持有关键信息。虽然人类在有限曝光下也能够分类图像,但人工神经网络通常在从罕见样本中选择特征时遇到困难。为了应对这个挑战,我们提出了一种选择类相关补丁嵌入的新方法。我们的方法将支持性和查询图像分割成补丁,并使用预训练的Vision Transformer(ViT)对其进行编码,分别获得类嵌入和补丁嵌入。接下来,我们使用类嵌入过滤补丁嵌入,保留只有类相关的补丁。对于每个图像,我们计算类嵌入与每个补丁嵌入之间的相似度,将相似度序列按下降顺序排序,并仅保留排名靠前的补丁嵌入。通过优先考虑类嵌入与补丁嵌入之间的相似性,我们选择排名靠前的补丁嵌入与类嵌入融合,形成全面图像表示,增强模式识别。通过有效地减轻类无关补丁嵌入的影响,我们的策略在预训练模型上产生了改进。在流行的小样本分类基准上进行广泛的实验,证明了我们的方法的简单性、有效性和计算效率,在5-shot和1-shot场景下均优于最先进的基线。
https://arxiv.org/abs/2405.03722
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples. Such ability stems from their capacity to identify common features shared between new and previously seen images while disregarding distractions such as background variations. However, for artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge. In this paper, we propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches and encoding them using the pre-trained Vision Transformer (ViT) architecture. Specifically, we swap the class (CLS) token and patch tokens between the support and query sets to have the mutual attention, which enables each set to focus on the most useful information. This facilitates the strengthening of intra-class representations and promotes closer proximity between instances of the same class. For implementation, we adopt the ViT-based network architecture and utilize pre-trained model parameters obtained through self-supervision. By leveraging Masked Image Modeling as a self-supervised training task for pre-training, the pre-trained model yields semantically meaningful representations while successfully avoiding supervision collapse. We then employ a meta-learning method to fine-tune the last several layers and CLS token modules. Our strategy significantly reduces the num- ber of parameters that require fine-tuning while effectively uti- lizing the capability of pre-trained model. Extensive experiments show that our framework is simple, effective and computationally efficient, achieving superior performance as compared to the state-of-the-art baselines on five popular few-shot classification benchmarks under the 5-shot and 1-shot scenarios
人类具有惊人的能力,在仅接受几张示例后,准确地对新的、未见过的图像进行分类。这种能力源于他们能够识别出新旧图像之间共有的常见特征,同时忽略诸如背景变化等干扰因素。然而,对于人工神经网络模型来说,在有限样本的情况下,确定最具区分性的特征以区分两张图像是一个挑战。在本文中,我们提出了一种基于自监督的少样本学习方法,该方法将支持集和查询集划分为补丁并使用预训练的Vision Transformer(ViT)架构进行编码。具体来说,我们交换支持集和查询集中的类(CLS)token和补丁token,实现相互关注,从而使每个集合都能关注最有用的信息。这有助于加强类内表示,促进同一类实例之间的 closer proximity。 实现方面,我们采用了基于ViT的网络架构,并利用自监督预训练模型参数。通过将遮罩图像建模作为一种自监督训练任务进行预训练,预训练模型产生了具有语义意义的表示,同时成功避免了监督衰减。然后,我们采用元学习方法对最后一层和CLS模块进行微调。我们的策略显著减少了需要微调的参数数量,同时有效地利用了预训练模型的能力。 大量的实验结果表明,我们的框架简单、有效且计算高效,在5-shot和1-shot场景下,相较于最先进的基线,我们的框架具有卓越的性能。
https://arxiv.org/abs/2405.03109
Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap}\footnotemark[3] by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8k. Since problems with logical flaws are quite rare in the real world, these represent ``unseen'' cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. We find that LLMs' performance can be \textbf{passively} improved through the above external intervention. Overall, systematic compositionality remains an open challenge for large language models.
人类认知表现出系统性,即从有限的已知组件中生成无限新组合的能力,这是理解复杂逻辑的关键。在这篇工作中,我们研究了大型语言模型(LLMs)在数学推理中的组成性。具体来说,我们通过在MATH和GSM8k的问题描述中引入精心设计的有理陷阱,构建了一个新的数据集\textsc{MathTrap}\footnotemark[3]。由于在现实生活中问题中的逻辑缺陷相当罕见,这些案例对LLMs来说是“未见过的”。解决这些问题需要模型系统地组合(1)原始问题中涉及的数学知识,(2)与引入的陷阱相关的知识。我们的实验结果表明,尽管LLMs同时具备所需的两个组件,但它们并不会自发地将它们组合起来处理这些新情况。我们探讨了几种方法来减轻这一缺陷,例如自然语言提示、少量样本演示和微调。我们发现,通过上述外部干预,LLM的性能可以被动地得到提高。总体而言,对大型语言模型的系统组成性仍然是一个未解决的问题。
https://arxiv.org/abs/2405.06680
This paper presents reports on a series of experiments with a novel dataset evaluating how well Large Language Models (LLMs) can mark (i.e. grade) open text responses to short answer questions, Specifically, we explore how well different combinations of GPT version and prompt engineering strategies performed at marking real student answers to short answer across different domain areas (Science and History) and grade-levels (spanning ages 5-16) using a new, never-used-before dataset from Carousel, a quizzing platform. We found that GPT-4, with basic few-shot prompting performed well (Kappa, 0.70) and, importantly, very close to human-level performance (0.75). This research builds on prior findings that GPT-4 could reliably score short answer reading comprehension questions at a performance-level very close to that of expert human raters. The proximity to human-level performance, across a variety of subjects and grade levels suggests that LLMs could be a valuable tool for supporting low-stakes formative assessment tasks in K-12 education and has important implications for real-world education delivery.
本文报告了对一个新数据集上的实验,以评估大型语言模型(LLMs)在标记开放性短答案问题中的表现。具体来说,我们探讨了在不同领域(科学和历史)和年级(涵盖5-16岁)下,使用新从未使用过的数据集Carousel,评估不同GPT版本和提示工程策略对真实学生答案的标记表现。我们发现,GPT-4在基本几次提示下表现良好(Kappa,0.70),而且重要的是,非常接近人类水平性能(0.75)。这一研究建立在先前的发现之上,即GPT-4可以在接近于专家人类评估者水平的性能水平上可靠地评分短答案阅读理解问题。在各种学科和年级的接近人类水平表现,表明LLM可以成为支持K-12教育中低 stake formative assessment任务的宝贵工具,对实际教育交付具有重要的意义。
https://arxiv.org/abs/2405.02985
Currently, the generative model has garnered considerable attention due to its application in addressing the challenge of scarcity of abnormal samples in the industrial Internet of Things (IoT). However, challenges persist regarding the edge deployment of generative models and the optimization of joint edge AI-generated content (AIGC) tasks. In this paper, we focus on the edge optimization of AIGC task execution and propose GMEL, a generative model-driven industrial AIGC collaborative edge learning framework. This framework aims to facilitate efficient few-shot learning by leveraging realistic sample synthesis and edge-based optimization capabilities. First, a multi-task AIGC computational offloading model is presented to ensure the efficient execution of heterogeneous AIGC tasks on edge servers. Then, we propose an attention-enhanced multi-agent reinforcement learning (AMARL) algorithm aimed at refining offloading policies within the IoT system, thereby supporting generative model-driven edge learning. Finally, our experimental results demonstrate the effectiveness of the proposed algorithm in optimizing the total system latency of the edge-based AIGC task completion.
目前,由于其在解决工业物联网(IoT)中异常样本 scarcity 的问题而受到了相当的关注。然而,关于生成模型的边缘部署和联合边缘 AI 生成的内容(AIGC)任务的优化问题仍然存在挑战。在本文中,我们重点关注了 AIGC 任务执行的边缘优化,并提出了 GMEL,一种基于生成模型的工业 AIGC 协同边缘学习框架。这个框架旨在通过利用真实的样本合成和边缘优化的能力来促进高效的少样本学习。首先,我们提出了一个多任务 AIGC 计算卸载模型,以确保在边缘服务器上高效执行异构性的 AIGC 任务。然后,我们提出了一种注意力增强的多代理强化学习(AMARL)算法,旨在在物联网系统内优化卸载策略,从而支持基于生成模型的边缘学习。最后,我们的实验结果证明了所提出的算法的有效性,即在优化基于边缘的 AIGC 任务完成时,可以降低系统的延迟总和。
https://arxiv.org/abs/2405.02972
Natural Language Inference (NLI) is a cornerstone of Natural Language Processing (NLP), providing insights into the entailment relationships between text pairings. It is a critical component of Natural Language Understanding (NLU), demonstrating the ability to extract information from spoken or written interactions. NLI is mainly concerned with determining the entailment relationship between two statements, known as the premise and hypothesis. When the premise logically implies the hypothesis, the pair is labeled ``entailment''. If the hypothesis contradicts the premise, the pair receives the ``contradiction'' label. When there is insufficient evidence to establish a connection, the pair is described as ``neutral''. Despite the success of Large Language Models (LLMs) in various tasks, their effectiveness in NLI remains constrained by issues like low-resource domain accuracy, model overconfidence, and difficulty in capturing human judgment disagreements. This study addresses the underexplored area of evaluating LLMs in low-resourced languages such as Bengali. Through a comprehensive evaluation, we assess the performance of prominent LLMs and state-of-the-art (SOTA) models in Bengali NLP tasks, focusing on natural language inference. Utilizing the XNLI dataset, we conduct zero-shot and few-shot evaluations, comparing LLMs like GPT-3.5 Turbo and Gemini 1.5 Pro with models such as BanglaBERT, Bangla BERT Base, DistilBERT, mBERT, and sahajBERT. Our findings reveal that while LLMs can achieve comparable or superior performance to fine-tuned SOTA models in few-shot scenarios, further research is necessary to enhance our understanding of LLMs in languages with modest resources like Bengali. This study underscores the importance of continued efforts in exploring LLM capabilities across diverse linguistic contexts.
自然语言推理(NLI)是自然语言处理(NLP)的一个基石,它揭示了文本对之间的蕴含关系。它是自然语言理解(NLU)的关键部分,展示了从口语或书面互动中提取信息的能力。NLI主要关注确定前提和结论之间的蕴含关系,即前提合理地推断出结论。当前提合理地推断出结论时,这对夫妇被标注为“一致性”。如果假设与前提相矛盾,这对夫妇获得“矛盾”标签。当证据不足以建立联系时,这对夫妇被描述为“中性”。尽管大型语言模型(LLMs)在各种任务上取得了成功,但它们在NLI上的有效性仍然受到诸如低资源领域准确性、模型过自信和难以捕捉人类判断分歧等问题 的限制。本研究探讨了在资源有限的语言如孟加拉语中评估LLMs的未知领域。通过全面的评估,我们评估了孟加拉语NLP任务中知名LLM和最先进的(SOTA)模型的性能,重点关注自然语言推理。利用XNLI数据集,我们进行了零散和少散评估,将LLM如GPT-3.5 Turbo和Gemini 1.5 Pro与BanglaBERT、Bangla BERT Base、DistilBERT、mBERT和sahajBERT等模型进行比较。我们的发现表明,尽管LLM可以在少散场景中实现与微调SOTA模型相当或更好的性能,但需要进一步研究来增强我们对在资源有限的语言如孟加拉语中LLM的理解。本研究强调在多样语言背景下继续探索LLM的能力的重要性。
https://arxiv.org/abs/2405.02937
We introduce LexBench, a comprehensive evaluation suite enabled to test language models (LMs) on ten semantic phrase processing tasks. Unlike prior studies, it is the first work to propose a framework from the comparative perspective to model the general semantic phrase (i.e., lexical collocation) and three fine-grained semantic phrases, including idiomatic expression, noun compound, and verbal construction. Thanks to \ourbenchmark, we assess the performance of 15 LMs across model architectures and parameter scales in classification, extraction, and interpretation tasks. Through the experiments, we first validate the scaling law and find that, as expected, large models excel better than the smaller ones in most tasks. Second, we investigate further through the scaling semantic relation categorization and find that few-shot LMs still lag behind vanilla fine-tuned models in the task. Third, through human evaluation, we find that the performance of strong models is comparable to the human level regarding semantic phrase processing. Our benchmarking findings can serve future research aiming to improve the generic capability of LMs on semantic phrase comprehension. Our source code and data are available at this https URL
我们介绍LexBench,一个全面评估 suite,用于在十 个语义短语处理任务上测试语言模型(LMs)。与之前的研究不同,这是第一篇论文,从比较的角度提出框架,以建模通用语义短语(即词汇搭配)。感谢\ourbenchmark,我们在分类、提取和解释任务上评估了15个 LM 的性能。通过实验,我们首先验证了缩放定律,并发现,与预期一致,大型模型在大多数任务上优于小型模型。其次,我们通过缩放语义关系分类进一步研究,发现几乎没有短期训练的 LM 在这项任务上仍然落后于基准微调模型。第三,通过人类评估,我们发现强模型在语义短语处理方面的性能与人类水平相当。我们的基准研究结果可以为未来研究提供参考,以提高 LMs 在语义短语理解方面的通用能力。我们的源代码和数据可在此处访问:https:// URL。
https://arxiv.org/abs/2405.02861
In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. Additionally, domain-centric designs are not flavored in their works. Furthermore, they employ the process of modelling source domains and the process of learning to adapt independently into disjoint training stages. In this work, we propose an approach on top of the pre-computed features of the foundation model. Specifically, we build a knowledge bank to learn the transferable knowledge from source domains. Conditioned on few-shot target data, we introduce a domain prompt generator to condense the knowledge bank into a domain-specific prompt. The domain prompt then directs the visual features towards a particular domain via a guidance module. Moreover, we propose a domain-aware contrastive loss and employ meta-learning to facilitate domain knowledge extraction. Extensive experiments are conducted to validate the domain knowledge extraction. The proposed method outperforms previous work on 5 large-scale benchmarks including WILDS and DomainNet.
在本文中,我们旨在在测试时间使用几个未标记的数据来适应分布变化。为了解决从有限数据中提取领域知识所带来的挑战,关键是要利用预训练后端和源域中的相关信息。以前的研究未能利用具有强大离散分布泛化能力的最近基础模型。此外,这些研究也没有在其工作中考虑领域中心设计。此外,它们还将建模源域和学习过程独立应用于分段训练阶段。在这项工作中,我们在预计算基础模型的预处理特征之上提出了一种方法。具体来说,我们建立了一个知识库来学习源域的可转移知识。在几 shot的目标数据条件下,我们引入了一个领域提示生成器,将知识库压缩成一个领域特定的提示。领域提示 then 通过指导模块将视觉特征指向特定的领域。此外,我们提出了一个领域感知的对比损失,并使用元学习促进领域知识提取。为了验证领域知识的提取,我们进行了大量的实验。与以前的工作相比,我们在包括WILDS和DomainNet在内的5个大型基准上取得了更优异的结果。
https://arxiv.org/abs/2405.02797
Large language models (LLMs) have shown remarkable adaptability to diverse tasks, by leveraging context prompts containing instructions, or minimal input-output examples. However, recent work revealed they also exhibit label bias -- an undesirable preference toward predicting certain answers over others. Still, detecting and measuring this bias reliably and at scale has remained relatively unexplored. In this study, we evaluate different approaches to quantifying label bias in a model's predictions, conducting a comprehensive investigation across 279 classification tasks and ten LLMs. Our investigation reveals substantial label bias in models both before and after debiasing attempts, as well as highlights the importance of outcomes-based evaluation metrics, which were not previously used in this regard. We further propose a novel label bias calibration method tailored for few-shot prompting, which outperforms recent calibration approaches for both improving performance and mitigating label bias. Our results emphasize that label bias in the predictions of LLMs remains a barrier to their reliability.
大语言模型(LLMs)通过利用包含指令的上下文提示或最小输入-输出示例展示了对于各种任务的显著适应性。然而,最近的工作表明,它们还表现出了标签偏见——对于预测某些答案的偏好,而不是其他答案的预测。然而,在可信赖度和规模上检测和衡量这种偏见仍然是一个相对未探索的问题。在这项研究中,我们评估了在模型预测中量化标签偏见的不同方法,对279个分类任务和10个LLM进行了全面的调查。我们的调查揭示了模型在Debiasing尝试前和之后的标签偏见,并强调了基于结果的评估指标之前在这一点上没有使用的重要性。我们进一步提出了一个针对少样本提示的新型标签偏见校准方法,该方法在提高性能和减轻标签偏见方面优于最近的方法。我们的结果强调了LLM预测中标签偏见仍然是一个对其可靠性的障碍。
https://arxiv.org/abs/2405.02743
With the deluge of information delivered by the daily news cycle, there is a growing need to effectively and efficiently summarize news feeds for quick consumption. We leverage large language models (LLMs), with their advanced learning and generative abilities as compared to conventional language models, to generate concise and coherent summaries for news articles from the XSum dataset. Our paper focuses on two key aspects of LLMs: Efficient in-context Learning (ELearn) and Parameter Efficient Fine-tuning (EFit). Under ELearn, we find that increasing the number of shots in prompts and utilizing simple templates generally improve the quality of summaries. We also find that utilizing relevant examples in few-shot learning for ELearn does not improve model performance. In addition, we studied EFit using different methods and demonstrate that fine-tuning the first layer of LLMs produces better outcomes as compared to fine-tuning other layers or utilizing LoRA. We also find that leveraging more relevant training samples using selective layers does not result in better performance. By combining ELearn and EFit, we create a new model (ELearnFit) that leverages the benefits of both few-shot learning and fine-tuning and produces superior performance to either model alone. We also use ELearnFit to highlight the trade-offs between prompting and fine-tuning, especially for situations where only a limited number of annotated samples are available. Ultimately, our research provides practical techniques to optimize news summarization during the prompting and fine-tuning stages and enhances the synthesis of news articles.
随着每日新闻循环带来的信息流量,越来越需要有效地和高效地概括新闻摘要,以便快速消费。我们利用大型语言模型(LLMs),其与传统语言模型的先进学习和生成能力相比,以生成简洁且连贯的新闻文章摘要。我们的论文重点关注LLMs的两个关键方面:在上下文中的高效学习(ELearn)和参数效率微调(EFit)。在ELearn方面,我们发现,增加提示中的 shot数并使用简单的模板通常会提高摘要的质量。我们还发现,在ELearn中使用相关示例并不会提高模型的性能。此外,我们研究了EFit,并表明,通过微调第一层LLMs,会产生更好的结果, compared to fine-tuning其他层或使用LoRA。我们还发现,通过选择性层利用更相关的训练样本,并不能提高性能。通过结合ELearn和EFit,我们创建了一个新模型(ELearnFit),它利用了两者之间的优势,并产生了优于单独模型的优异性能。我们还使用ELearnFit突出了提示和微调之间的权衡,尤其是在只有有限数量注释样本的情况下的情况。最终,我们的研究为优化新闻摘要的提示和微调阶段提供了实际技术,并提高了新闻文章的合成。
https://arxiv.org/abs/2405.02710
Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
尽管大型语言模型(LLMs)在文本到关系任务中的成功,开源LLM在上下文理解和响应连贯方面遇到了挑战。为解决这些问题,我们提出了一个针对文本到关系的开源LLM的系统方法,名为我们的研究。我们的贡献包括对开源LLM在文本到关系任务中进行全面评估,有效的问答表示策略以及用于监督微调的新策略。我们探讨了在逐步推理中 Chain-of-Thought 的优势,并提出了用于增强少样本学习的高峰方法的开放例子方法。此外,我们还引入了高效的技术,如变长打开数据库模式、目标列截断和示例列截断,解决了大型数据库中的挑战。我们的研究结果强调了在监督微调对上下文学习能力的影响方面进行进一步调查的必要性。值得注意的是,我们的方法将Llama2-7B从2.54%提高到41.04%,将Code Llama-7B从14.54%提高到48.24%,在BIRD-Dev数据集上的表现。值得注意的是,Code Llama-7B在BIRD-Dev数据集上的表现超过了GPT-4(46.35%)。
https://arxiv.org/abs/2405.06674
Advancements in machine learning, computer vision, and robotics have paved the way for transformative solutions in various domains, particularly in agriculture. For example, accurate identification and segmentation of fruits from field images plays a crucial role in automating jobs such as harvesting, disease detection, and yield estimation. However, achieving robust and precise infield fruit segmentation remains a challenging task since large amounts of labeled data are required to handle variations in fruit size, shape, color, and occlusion. In this paper, we develop a few-shot semantic segmentation framework for infield fruits using transfer learning. Concretely, our work is aimed at addressing agricultural domains that lack publicly available labeled data. Motivated by similar success in urban scene parsing, we propose specialized pre-training using a public benchmark dataset for fruit transfer learning. By leveraging pre-trained neural networks, accurate semantic segmentation of fruit in the field is achieved with only a few labeled images. Furthermore, we show that models with pre-training learn to distinguish between fruit still on the trees and fruit that have fallen on the ground, and they can effectively transfer the knowledge to the target fruit dataset.
机器学习、计算机视觉和机器人技术的发展为各个领域带来了 transformative 解决方案,尤其是在农业领域。例如,准确从田间图像中识别和分割水果在自动化诸如采摘、疾病检测和产量估计等任务中扮演着关键角色。然而,实现稳健且精确的田间水果分割仍然具有挑战性,因为需要大量标记数据来处理水果的大小、形状、颜色和遮挡的变异。在本文中,我们为田间水果使用迁移学习开发了一个几 shot semantic segmentation 框架。具体来说,我们的工作旨在解决缺乏公开可用标记数据的农业领域。受到城市场景解析的成功启发,我们提出了使用公共基准数据集进行水果转移学习的专用预训练方案。通过利用预训练的神经网络,可以在仅几张标记图片的情况下实现水果在田间的准确语义分割。此外,我们还证明了经过预训练的模型能够区分仍然在树上的水果和已经掉在地上的水果,并且可以有效地将知识传递到目标水果数据集中。
https://arxiv.org/abs/2405.02556
Advancements in wearable sensor technologies and the digitization of medical records have contributed to the unprecedented ubiquity of biomedical time series data. Data-driven models have tremendous potential to assist clinical diagnosis and improve patient care by improving long-term monitoring capabilities, facilitating early disease detection and intervention, as well as promoting personalized healthcare delivery. However, accessing extensively labeled datasets to train data-hungry deep learning models encounters many barriers, such as long-tail distribution of rare diseases, cost of annotation, privacy and security concerns, data-sharing regulations, and ethical considerations. An emerging approach to overcome the scarcity of labeled data is to augment AI methods with human-like capabilities to leverage past experiences to learn new tasks with limited examples, called few-shot learning. This survey provides a comprehensive review and comparison of few-shot learning methods for biomedical time series applications. The clinical benefits and limitations of such methods are discussed in relation to traditional data-driven approaches. This paper aims to provide insights into the current landscape of few-shot learning for biomedical time series and its implications for future research and applications.
随着可穿戴传感器技术的发展和医疗记录的数字化,生物医学时间序列数据的普及达到了前所未有的程度。数据驱动的模型具有巨大的潜力,可以通过提高长期监测能力、促进早期疾病检测和干预、以及推动个性化医疗交付,从而协助临床诊断和改善患者护理。然而,访问大量带标签的医疗数据来训练渴望深度学习模型的过程中遇到了许多障碍,例如罕见疾病的长尾分布、注释成本、隐私和安全问题、数据共享规定和伦理问题等。克服数据稀缺的一种新兴方法是增强人工智能方法的人性化能力,利用过去的经验来学习新的任务,称为少样本学习。本调查对少样本学习方法在生物医学时间序列应用进行了全面的回顾和比较。这些方法与传统数据驱动方法相关的临床益处和局限性进行了讨论。本文旨在为生物医学时间序列的少样本学习现状提供见解,并探讨其对未来研究和应用的潜在影响。
https://arxiv.org/abs/2405.02485
The development of large vision-language models, notably CLIP, has catalyzed research into effective adaptation techniques, with a particular focus on soft prompt tuning. Conjointly, test-time augmentation, which utilizes multiple augmented views of a single image to enhance zero-shot generalization, is emerging as a significant area of interest. This has predominantly directed research efforts toward test-time prompt tuning. In contrast, we introduce a robust MeanShift for Test-time Augmentation (MTA), which surpasses prompt-based methods without requiring this intensive training procedure. This positions MTA as an ideal solution for both standalone and API-based applications. Additionally, our method does not rely on ad hoc rules (e.g., confidence threshold) used in some previous test-time augmentation techniques to filter the augmented views. Instead, MTA incorporates a quality assessment variable for each view directly into its optimization process, termed as the inlierness score. This score is jointly optimized with a density mode seeking process, leading to an efficient training- and hyperparameter-free approach. We extensively benchmark our method on 15 datasets and demonstrate MTA's superiority and computational efficiency. Deployed easily as plug-and-play module on top of zero-shot models and state-of-the-art few-shot methods, MTA shows systematic and consistent improvements.
大视觉语言模型的开发,特别是CLIP,已经推动了有效适应技术的研究,特别是对软提示进行优化。同时,测试时间增强,利用单张图像的多个增强视图来提高零样本通用性,正在成为一个有趣的领域。这一方向主要将研究精力集中在测试时间提示调整上。相比之下,我们引入了一个稳健的MeanShift for Test-time Augmentation(MTA),它超过了需要这种密集训练过程的基于提示的方法。这使得MTA成为适用于离线和API基础应用的理想解决方案。此外,我们的方法不依赖于某些以前测试时间增强技术中使用的临界值(例如置信度阈值)来过滤增强视图。相反,MTA将每个视图的直接质量评估量融入优化过程,称为异常得分。这个分数与密度模式寻求过程共同优化,导致了一种高效的学习- 和超参数- 免费的方法。我们在15个数据集上对方法进行了广泛的基准,证明了MTA的优越性和计算效率。部署容易地作为零样本模型和最先进的少量样本方法的插件,MTA显示出系统性和一致性的改进。
https://arxiv.org/abs/2405.02266
Audio recordings may provide important evidence in criminal investigations. One such case is the forensic association of the recorded audio to the recording location. For example, a voice message may be the only investigative cue to narrow down the candidate sites for a crime. Up to now, several works provide tools for closed-set recording environment classification under relatively clean recording conditions. However, in forensic investigations, the candidate locations are case-specific. Thus, closed-set tools are not applicable without retraining on a sufficient amount of training samples for each case and respective candidate set. In addition, a forensic tool has to deal with audio material from uncontrolled sources with variable properties and quality. In this work, we therefore attempt a major step towards practical forensic application scenarios. We propose a representation learning framework called EnvId, short for environment identification. EnvId avoids case-specific retraining. Instead, it is the first tool for robust few-shot classification of unseen environment locations. We demonstrate that EnvId can handle forensically challenging material. It provides good quality predictions even under unseen signal degradations, environment characteristics or recording position mismatches. Our code and datasets will be made publicly available upon acceptance.
音频录音可能为刑事调查提供重要证据。一个这样的案例是将记录的音频与录音地点之间的 forensic协会。例如,一条语音信息可能是缩小犯罪候选地点的唯一调查线索。到目前为止,几篇论文提供了在相对干净的录音条件下对闭路式录音环境分类的工具。然而,在法医调查中,候选地点是具体的。因此,在没有对每个案件和相应候选集进行足够的训练样本重新训练的情况下,闭路式工具是不适用的。此外,法医工具必须处理来自无法控制来源的音频材料,其性质和质量不断变化。在这篇论文中,我们因此试图在实际法医应用场景中迈出关键一步。我们提出了一个称为EnvId的环境识别框架,代表为环境识别。EnvId避免了案件特定的重新训练。相反,它是第一个用于未见过的环境地点的稳健少样本分类的工具。我们证明了EnvId可以处理具有挑战性的法医材料。在未见到的信号衰减、环境特征或录音位置不匹配的情况下,其预测质量仍很好。我们的代码和数据集将在接受提交时公开提供。
https://arxiv.org/abs/2405.02119
Recent few-shot action recognition (FSAR) methods achieve promising performance by performing semantic matching on learned discriminative features. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, \etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to predict query categories more accurately under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, and SSv2-small).
近年来,一些几帧动作识别(FSAR)方法通过在学到的区分性特征上进行语义匹配来实现出色的性能。然而,大多数FSAR方法都关注单尺度(例如,帧级别、段级别等)特征对齐,这忽略了人类具有相同语义的动作在不同的速度下可能出现的事实。为此,我们提出了一个名为多速度渐进对齐(MVP-Shot)的新框架,以在多速度级别上逐步学习和对齐语义相关动作特征。具体来说,我们设计了一个多速度特征对齐(MVFA)模块,用于测量不同速度尺度支持视频和查询视频的特征之间的相似性,然后以残差方式合并所有相似度分数。为了避免多个速度特征脱离底层运动语义,我们提出的渐进语义自适应交互(PSTI)模块通过在不同速度下的通道和时域特征交互注入速度相关的文本信息。上述两个模块相互补充,在几帧设置下更准确地预测查询类别。实验结果表明,我们的方法在多个标准几帧基准(即HMDB51、UCF101、Kinetics和SSv2-small)上优于现有技术水平。
https://arxiv.org/abs/2405.02077
Large Language Models (LLMs) are deep learning models designed to generate text based on textual input. Although researchers have been developing these models for more complex tasks such as code generation and general reasoning, few efforts have explored how LLMs can be applied to combinatorial problems. In this research, we investigate the potential of LLMs to solve the Travelling Salesman Problem (TSP). Utilizing GPT-3.5 Turbo, we conducted experiments employing various approaches, including zero-shot in-context learning, few-shot in-context learning, and chain-of-thoughts (CoT). Consequently, we fine-tuned GPT-3.5 Turbo to solve a specific problem size and tested it using a set of various instance sizes. The fine-tuned models demonstrated promising performance on problems identical in size to the training instances and generalized well to larger problems. Furthermore, to improve the performance of the fine-tuned model without incurring additional training costs, we adopted a self-ensemble approach to improve the quality of the solutions.
大语言模型(LLMs)是一种基于文本输入的深度学习模型,旨在生成文本。尽管研究人员已经为更复杂的任务,如代码生成和一般推理开发了这些模型,但很少有人研究过LLMs如何应用于组合问题。在本文中,我们研究了LLMs解决旅行推销员问题(TSP)的潜力。通过利用GPT-3.5涡轮,我们进行了各种实验,包括零散在上下文中学习、少量在上下文中学习和连锁思维(CoT)。因此,我们微调了GPT-3.5涡轮以解决特定问题大小,并使用一系列不同的实例大小对其进行了测试。微调后的模型在大小与训练实例相同的问题上表现出优异的性能,并且对较大问题表现出良好的泛化能力。此外,为了在不产生额外训练成本的情况下提高微调模型的性能,我们还采用了一种自集成方法来提高解决方案的质量。
https://arxiv.org/abs/2405.01997