Training a linear classifier or lightweight model on top of pretrained vision model outputs, so-called 'frozen features', leads to impressive performance on a number of downstream few-shot tasks. Currently, frozen features are not modified during training. On the other hand, when networks are trained directly on images, data augmentation is a standard recipe that improves performance with no substantial overhead. In this paper, we conduct an extensive pilot study on few-shot image classification that explores applying data augmentations in the frozen feature space, dubbed 'frozen feature augmentation (FroFA)', covering twenty augmentations in total. Our study demonstrates that adopting a deceptively simple pointwise FroFA, such as brightness, can improve few-shot performance consistently across three network architectures, three large pretraining datasets, and eight transfer datasets.
在预训练的视觉模型输出上训练线性分类器或轻量级模型,所谓的“冻点特征”,在许多下游的少样本任务上表现出令人印象深刻的性能。目前,在训练过程中不会修改冻点特征。另一方面,当网络直接在图像上训练时,数据增强是一个标准的配方,可以提高性能而不会产生实质性的开销。在本文中,我们对少样本图像分类进行了一项广泛的先导研究,探讨了在冻点特征空间中应用数据增强,称之为“冻点特征增强(FroFA)”,包括总共二十个增强。我们的研究证明了,采用看似简单的点式FroFA,例如亮度,可以显著提高三个网络架构、三个大型预训练数据集和八个传输数据集的少样本性能。
https://arxiv.org/abs/2403.10519
Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.
深度特征是计算机视觉研究的核心,捕捉图像语义并使社区能够在零或少数样本的情况下解决下游任务。然而,这些特征通常缺乏进行如分割和深度预测等直接密集预测任务的空间分辨率,因为模型在大型区域上积极抽取信息。在这项工作中,我们引入了FeatUp,一个任务和模型无关的框架,用于在深度特征中恢复丢失的时空信息。我们引入了两种FeatUp变体:一种在单前向传递中指导具有高分辨率信号的特征,另一种将隐式模型适配于单个图像以重构任何分辨率下的特征。两种方法都使用深度模拟NeRFs的多视图一致性损失。我们的特征保留其原始语义,可以交换到现有的应用程序中,甚至在没有重新训练的情况下实现分辨率和性能的提升。我们证明了FeatUp在类激活图生成、用于分割和深度预测的迁移学习以及语义分割的端到端训练方面显著优于其他特征放大和图像超分辨率方法。
https://arxiv.org/abs/2403.10516
Humans can learn a new word and infer its grammatical properties from very few examples. They have an abstract notion of linguistic properties like grammatical gender and agreement rules that can be applied to novel syntactic contexts and words. Drawing inspiration from psycholinguistics, we conduct a noun learning experiment to assess whether an LSTM and a decoder-only transformer can achieve human-like abstraction of grammatical gender in French. Language models were tasked with learning the gender of a novel noun embedding from a few examples in one grammatical agreement context and predicting agreement in another, unseen context. We find that both language models effectively generalise novel noun gender from one to two learning examples and apply the learnt gender across agreement contexts, albeit with a bias for the masculine gender category. Importantly, the few-shot updates were only applied to the embedding layers, demonstrating that models encode sufficient gender information within the word embedding space. While the generalisation behaviour of models suggests that they represent grammatical gender as an abstract category, like humans, further work is needed to explore the details of how exactly this is implemented. For a comparative perspective with human behaviour, we conducted an analogous one-shot novel noun gender learning experiment, which revealed that native French speakers, like language models, also exhibited a masculine gender bias and are not excellent one-shot learners either.
人类可以从很少的例子中学会一个新的单词,并推断出其语义特征。他们具有类似于语义特征的抽象概念,如语性别和一致规则,可以应用于新颖的句法上下文和单词。从心理语言学的灵感出发,我们进行了一次名词学习实验,以评估是否可以使用LSTM和decoder-only transformer实现类似的人类对语性特征的抽象理解。语言模型被要求从几个例子中学习一个新的名词的性别,从一个语义一致上下文中预测另一个未见过的上下文中的同意。我们发现,两个语言模型都能有效地从一两个学习例子中推广新颖的 noun 性别,并将其应用到一致上下文中,尽管对男性性别类别有偏见。重要的是,只应用了少量的样本更新,这表明模型在词嵌入空间中编码了足够的性别信息。虽然模型的泛化行为表明它们将语性特征表示为抽象类别,就像人类一样,但还需要进一步工作来探索实际上是如何实现的。为了与人类行为进行比较,我们进行了一次类似的一小样本新名词性别学习实验,结果表明,与语言模型一样,母语为法语的参与者也表现出性别偏见,并且也不是很好的单击学习者。
https://arxiv.org/abs/2403.10338
The task of few-shot image classification and segmentation (FS-CS) involves classifying and segmenting target objects in a query image, given only a few examples of the target classes. We introduce the Vision-Instructed Segmentation and Evaluation (VISE) method that transforms the FS-CS problem into the Visual Question Answering (VQA) problem, utilising Vision-Language Models (VLMs), and addresses it in a training-free manner. By enabling a VLM to interact with off-the-shelf vision models as tools, the proposed method is capable of classifying and segmenting target objects using only image-level labels. Specifically, chain-of-thought prompting and in-context learning guide the VLM to answer multiple-choice questions like a human; vision models such as YOLO and Segment Anything Model (SAM) assist the VLM in completing the task. The modular framework of the proposed method makes it easily extendable. Our approach achieves state-of-the-art performance on the Pascal-5i and COCO-20i datasets.
少量样本图像分类和分割(FS-CS)任务的目的是对查询图像中的目标对象进行分类和分割,而给定只有几个目标类别的例子。我们引入了 Vision-Instructed Segmentation and Evaluation(VISE)方法,将FS-CS问题转化为视觉问答(VQA)问题,利用视觉语言模型(VLMs),并且以无需训练的方式解决了这个问题。通过使视觉模型与通用视觉模型作为工具进行交互,所提出的方法能够使用仅有的图像级别标签对目标对象进行分类和分割。具体来说,连锁思考提示和上下文学习引导VLM像人类一样回答多选题;像YOLO和Segment Anything Model(SAM)这样的视觉模型帮助VLM完成任务。所提出方法的模块化框架使其易于扩展。我们的方法在Pascal-5i和COCO-20i数据集上实现了最先进的性能。
https://arxiv.org/abs/2403.10287
Few-Shot Class-Incremental Learning (FSCIL) models aim to incrementally learn new classes with scarce samples while preserving knowledge of old ones. Existing FSCIL methods usually fine-tune the entire backbone, leading to overfitting and hindering the potential to learn new classes. On the other hand, recent prompt-based CIL approaches alleviate forgetting by training prompts with sufficient data in each task. In this work, we propose a novel framework named Attention-aware Self-adaptive Prompt (ASP). ASP encourages task-invariant prompts to capture shared knowledge by reducing specific information from the attention aspect. Additionally, self-adaptive task-specific prompts in ASP provide specific information and transfer knowledge from old classes to new classes with an Information Bottleneck learning objective. In summary, ASP prevents overfitting on base task and does not require enormous data in few-shot incremental tasks. Extensive experiments on three benchmark datasets validate that ASP consistently outperforms state-of-the-art FSCIL and prompt-based CIL methods in terms of both learning new classes and mitigating forgetting.
少样本分类递增学习(FSCIL)模型旨在通过稀疏样本逐步学习新类别,同时保留旧知识。现有的FSCIL方法通常会对整个骨干网络进行微调,导致过拟合并阻碍学习新类别的潜力。另一方面,最近基于提示的CIL方法通过在任务中训练充足的提示来缓解遗忘。在本文中,我们提出了一种名为注意到的自适应提示(ASP)的新框架。ASP通过减少注意方面的具体信息来鼓励任务无关的提示,从而捕捉共享知识。此外,ASP中的自适应任务特定提示在信息瓶颈学习目标下提供了具体信息,并将旧类别的知识传递给新类别。总之,ASP在基本任务上避免了过拟合,并且不需要在几样本递增任务中提供大量数据。在三个基准数据集上的广泛实验证明,ASP在学习和减轻遗忘方面显著优于最先进的FSCIL和基于提示的CIL方法。
https://arxiv.org/abs/2403.09857
When writing and talking, people sometimes pause to think. Although reasoning-focused works have often framed reasoning as a method of answering questions or completing agentic tasks, reasoning is implicit in almost all written text. For example, this applies to the steps not stated between the lines of a proof or to the theory of mind underlying a conversation. In the Self-Taught Reasoner (STaR, Zelikman et al. 2022), useful thinking is learned by inferring rationales from few-shot examples in question-answering and learning from those that lead to a correct answer. This is a highly constrained setting -- ideally, a language model could instead learn to infer unstated rationales in arbitrary text. We present Quiet-STaR, a generalization of STaR in which LMs learn to generate rationales at each token to explain future text, improving their predictions. We address key challenges, including 1) the computational cost of generating continuations, 2) the fact that the LM does not initially know how to generate or use internal thoughts, and 3) the need to predict beyond individual next tokens. To resolve these, we propose a tokenwise parallel sampling algorithm, using learnable tokens indicating a thought's start and end, and an extended teacher-forcing technique. Encouragingly, generated rationales disproportionately help model difficult-to-predict tokens and improve the LM's ability to directly answer difficult questions. In particular, after continued pretraining of an LM on a corpus of internet text with Quiet-STaR, we find zero-shot improvements on GSM8K (5.9%$\rightarrow$10.9%) and CommonsenseQA (36.3%$\rightarrow$47.2%) and observe a perplexity improvement of difficult tokens in natural text. Crucially, these improvements require no fine-tuning on these tasks. Quiet-STaR marks a step towards LMs that can learn to reason in a more general and scalable way.
在写作和交流中,人们有时会暂停下来思考。尽管以推理为基础的作品经常将推理视为回答问题或完成代理任务的方法,但几乎所有书面文本中都有隐含的推理。例如,这适用于证明中未给出的步骤或对话中主导思想的理论。在 Self-Taught Reasoner(STaR,Zelikman 等人,2022)中,通过从问题回答中推理出合理的论据以及从导致正确答案的推理中学习,人们可以获得有用的思考。这是一种非常受约束的设置——理想情况下,语言模型可以学习在任意文本中推断未表达的合理论据。我们提出了 Quiet-STaR,一种扩展了 STaR 的方法,其中语言模型在每个标记处生成推理以解释未来的文本,从而提高他们的预测。我们解决了关键挑战,包括 1)生成连续性所需的计算成本,2)LM 最初不知道如何生成或使用内部思考,3)需要预测超过单个下一个标记。为解决这些挑战,我们提出了一个基于可学习标记的标记级并行采样算法,使用可学习标记表示思想的开始和结束,以及一种扩展的教师强化技术。令人高兴的是,生成的推理对难以预测的标记具有不成比例的贡献,并提高了LM直接回答难问题的能力。特别地,在用 Quiet-STaR 对互联网文本进行连续预训练后,我们在 GSM8K(5.9%$\rightarrow$10.9%)和 CommonSenseQA(36.3%$\rightarrow$47.2%)上实现了零样本改善,并观察到自然文本中难以预测标记的混淆改善。关键的是,这些改善无需在这些任务上进行微调。Quiet-STaR 标志着LM可以在更一般和可扩展的方式来学习推理的阶段。
https://arxiv.org/abs/2403.09629
In this work, we discuss building performant Multimodal Large Language Models (MLLMs). In particular, we study the importance of various architecture components and data choices. Through careful and comprehensive ablations of the image encoder, the vision language connector, and various pre-training data choices, we identified several crucial design lessons. For example, we demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art (SOTA) few-shot results across multiple benchmarks, compared to other published pre-training results. Further, we show that the image encoder together with image resolution and the image token count has substantial impact, while the vision-language connector design is of comparatively negligible importance. By scaling up the presented recipe, we build MM1, a family of multimodal models up to 30B parameters, consisting of both dense models and mixture-of-experts (MoE) variants, that are SOTA in pre-training metrics and achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks. Thanks to large-scale pre-training, MM1 enjoys appealing properties such as enhanced in-context learning, and multi-image reasoning, enabling few-shot chain-of-thought prompting.
在这项工作中,我们讨论了构建高性能的多模态大型语言模型(MLLMs)。特别是,我们研究了各种架构组件和数据选择的重要性。通过仔细和全面的图像编码器、视觉语言连接器和各种预训练数据选择,我们识别出几个关键的设计经验。例如,我们证明了,在大型多模态预训练中,使用仔细混合图像捕捉、平滑图像文本和文本only数据对于在多个基准上实现最先进的(SOTA)几 shot结果至关重要,与其他已发表的预训练结果相比。此外,我们还证明了图像编码器与图像分辨率相结合,对图像标记计数有相当大的影响,而视觉语言连接器的设计则相对较小。通过扩展所提出的食谱,我们构建了MM1,一种具有30B参数的 multimodal 模型家族,包括密集模型和专家混合(MoE)变体,在预训练指标上实现了最先进的性能,并在各种已建立的多模态基准上实现了具有竞争力的性能。由于大规模预训练,MM1 具有诸如增强的上下文学习 和多图像推理 这样的有吸引力的特性,实现了几 shot链式思维提示。
https://arxiv.org/abs/2403.09611
In image-based robot manipulation tasks with large observation and action spaces, reinforcement learning struggles with low sample efficiency, slow training speed, and uncertain convergence. As an alternative, large pre-trained foundation models have shown promise in robotic manipulation, particularly in zero-shot and few-shot applications. However, using these models directly is unreliable due to limited reasoning capabilities and challenges in understanding physical and spatial contexts. This paper introduces ExploRLLM, a novel approach that leverages the inductive bias of foundation models (e.g. Large Language Models) to guide exploration in reinforcement learning. We also exploit these foundation models to reformulate the action and observation spaces to enhance the training efficiency in reinforcement learning. Our experiments demonstrate that guided exploration enables much quicker convergence than training without it. Additionally, we validate that ExploRLLM outperforms vanilla foundation model baselines and that the policy trained in simulation can be applied in real-world settings without additional training.
在具有大观察和动作空间的大型图像机器人操作任务中,强化学习在低样本效率、训练速度不确定性和收敛不确定性方面遇到了挑战。作为替代方法,已经表明在机器人操作中具有大预训练基础模型的潜力,特别是在零 shot和少 shot应用中。然而,由于这些模型的推理能力有限和难以理解物理和空间上下文,这些模型直接使用是不可靠的。本文介绍了一种新方法ExploRLLM,它利用基础模型的归纳偏见(例如大型语言模型)来指导强化学习中的探索。我们还利用这些基础模型来重新表示动作和观察空间,以提高强化学习中的训练效率。我们的实验证明,指导探索能够比不使用它更快地收敛。此外,我们还验证了ExploRLLM在基准模型上的表现优于原模型,以及在模拟环境中训练的策略可以在现实世界中应用而无需额外训练。
https://arxiv.org/abs/2403.09583
Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited labeled data, naively integrating ID into MAEs leads to extended training times and high computational costs. To address this challenge, we introduce uaMix-MAE, an efficient ID tuning strategy that leverages unsupervised audio mixtures. Utilizing contrastive tuning, uaMix-MAE aligns the representations of pretrained MAEs, thereby facilitating effective adaptation to task-specific semantics. To optimize the model with small amounts of unlabeled data, we propose an audio mixing technique that manipulates audio samples in both input and virtual label spaces. Experiments in low/few-shot settings demonstrate that \modelname achieves 4-6% accuracy improvements over various benchmarks when tuned with limited unlabeled data, such as AudioSet-20K. Code is available at this https URL
遮蔽自动编码器(MAEs)从未标记的数据中学习丰富的低层次表示,但需要大量的标记数据才能有效地适应下游任务。相反,实例分类(ID)强调高级语义,为解决MAEs中的注释需求提供了一个潜在的解决方案。虽然将这两种方法结合起来可以在有限的标记数据下解决下游任务,但无意识地将ID集成到MAEs中会导致延长训练时间和高计算成本。为了应对这个挑战,我们引入了uaMix-MAE,一种有效的ID调整策略,利用无监督音频混合。通过对比调整,uaMix-MAE将预训练MAEs的表示对齐,从而促进针对任务特定语义的有效适应。为优化模型并在少量未标记数据上进行调整,我们提出了一种音频混合技术,在输入和虚拟标签空间中操作音频样本。在低/少样本设置下的实验表明,当用有限的无标记数据对模型进行调整时,\modelname在各种基准测试中的准确率可以达到4-6%的提高,例如AudioSet-20K。代码可以从该链接获取:<https://this-url>
https://arxiv.org/abs/2403.09579
In this paper, we explore the capabilities of LLMs in capturing lexical-semantic knowledge from WordNet on the example of the LLaMA-2-7b model and test it on multiple lexical semantic tasks. As the outcome of our experiments, we present TaxoLLaMA, the everything-in-one model, lightweight due to 4-bit quantization and LoRA. It achieves 11 SotA results, 4 top-2 results out of 16 tasks for the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks. Moreover, it demonstrates very strong zero-shot performance on Lexical Entailment and Taxonomy Construction with no fine-tuning. We also explore its hidden multilingual and domain adaptation capabilities with a little tuning or few-shot learning. All datasets, code, and model are available online at this https URL
在本文中,我们探讨了LLM在从WordNet中捕获词汇-语义知识的能力,并通过对LLMA-2-7b模型的测试来验证其效果。我们提出的TaxoLLaMA,一切尽在其中的模型,得益于4位量化轻量级。它实现了11个SOTA结果,包括4个Top-2结果,这些结果分别应用于分类增强、超类发现、分类构建和词汇关联任务。此外,我们还展示了其在词汇关联和分类构建上的非常强的零散shot性能,无需进行微调。我们还通过略微调整或少样本学习来探讨其隐藏的多语言和领域适应能力。所有数据集、代码和模型都可以在这个[https://url]上找到。
https://arxiv.org/abs/2403.09207
No previous work has studied the performance of Large Language Models (LLMs) in the context of Traditional Chinese Medicine (TCM), an essential and distinct branch of medical knowledge with a rich history. To bridge this gap, we present a TCM question dataset named TCM-QA, which comprises three question types: single choice, multiple choice, and true or false, to examine the LLM's capacity for knowledge recall and comprehensive reasoning within the TCM domain. In our study, we evaluate two settings of the LLM, zero-shot and few-shot settings, while concurrently discussing the differences between English and Chinese prompts. Our results indicate that ChatGPT performs best in true or false questions, achieving the highest precision of 0.688 while scoring the lowest precision is 0.241 in multiple-choice questions. Furthermore, we observed that Chinese prompts outperformed English prompts in our evaluations. Additionally, we assess the quality of explanations generated by ChatGPT and their potential contribution to TCM knowledge comprehension. This paper offers valuable insights into the applicability of LLMs in specialized domains and paves the way for future research in leveraging these powerful models to advance TCM.
目前还没有关于大型语言模型(LLMs)在传统中医(TCM)领域的性能研究。TCM是医学知识中不可或缺且具有丰富历史的一个重要分支。为了填补这一空白,我们提出了一个名为TCM-QA的TCM问题数据集,它包括三种问题类型:单选、多选和真或假,以评估LLM在TCM领域知识回忆和全面推理的能力。在我们的研究中,我们评估了LLM的两个设置,即零击和少击设置,同时讨论了英语和中文提示之间的差异。我们的结果表明,ChatGPT在真或假问题中表现最好,达到最高精度为0.688,而多选问题中的最低精度为0.241。此外,我们还观察到中文提示在我们的评估中优于英语提示。此外,我们评估了由ChatGPT生成的解释的质量和它们对TCM知识理解的潜在贡献。本文为LLMs在专业领域的应用提供了宝贵的洞见,并为未来研究利用这些强大的模型推动TCM奠定了基础。
https://arxiv.org/abs/2403.09164
The Intelligent Transportation System (ITS) environment is known to be dynamic and distributed, where participants (vehicle users, operators, etc.) have multiple, changing and possibly conflicting objectives. Although Reinforcement Learning (RL) algorithms are commonly applied to optimize ITS applications such as resource management and offloading, most RL algorithms focus on single objectives. In many situations, converting a multi-objective problem into a single-objective one is impossible, intractable or insufficient, making such RL algorithms inapplicable. We propose a multi-objective, multi-agent reinforcement learning (MARL) algorithm with high learning efficiency and low computational requirements, which automatically triggers adaptive few-shot learning in a dynamic, distributed and noisy environment with sparse and delayed reward. We test our algorithm in an ITS environment with edge cloud computing. Empirical results show that the algorithm is quick to adapt to new environments and performs better in all individual and system metrics compared to the state-of-the-art benchmark. Our algorithm also addresses various practical concerns with its modularized and asynchronous online training method. In addition to the cloud simulation, we test our algorithm on a single-board computer and show that it can make inference in 6 milliseconds.
智能交通系统(ITS)环境被认为具有动态和分布式特性,其中参与者(车辆用户、运营商等)具有多个变化且可能相互冲突的目标。虽然强化学习(RL)算法通常应用于优化ITS应用,如资源管理和卸货,但大多数RL算法都关注单一目标。在许多情况下,将多目标问题转换为单目标问题是不可能的、难以求解或足够的,使得这些RL算法没有实际应用价值。我们提出了一种多目标、多智能体强化学习(MARL)算法,具有高学习效率和低计算需求,能自动在动态、分布和嘈杂环境中触发自适应的少样本学习。我们在边缘云计算环境中测试了这种算法。实验结果表明,该算法对新环境适应很快,在所有个体和系统指标方面表现优于最先进的基准。我们的算法还通过模块化和异步在线训练方法解决了各种实际问题。除了云仿真之外,我们还将在单个主板计算机上测试我们的算法,并证明它可以在4毫秒内进行推理。
https://arxiv.org/abs/2403.08879
Knee osteoarthritis is a degenerative joint disease that induces chronic pain and disability. Bone morphological analysis is a promising tool to understand the mechanical aspect of this disorder. This study proposes a 2D bone morphological analysis using manually segmented bones to explore morphological features related to distinct pain conditions. Furthermore, six semantic segmentation algorithms are assessed for extracting femur and tibia bones from X-ray images. Our analysis reveals that the morphology of the femur undergoes significant changes in instances where pain worsens. Conversely, improvements in pain may not manifest pronounced alterations in bone shape. The few-shot-learning-based algorithm, UniverSeg, demonstrated superior segmentation results with Dice scores of 99.69% for femur and 99.60% for tibia. Regarding pain condition classification, the zero-shot-learning-based algorithm, CP-SAM, achieved the highest accuracy at 66% among all models. UniverSeg is recommended for automatic knee bone segmentation, while SAM models show potential with prompt encoder modifications for optimized outcomes. These findings highlight the effectiveness of few-shot learning for semantic segmentation and the potential of zero-shot learning in enhancing classification models for knee osteoarthritis diagnosis.
膝关节骨性关节炎是一种退行性关节疾病,导致持续疼痛和残疾。骨形态分析是一种有希望的工具,用于了解这种疾病的机械方面。本研究提出了一种使用手动分割的骨进行二维骨形态分析,以探索与不同疼痛情况相关的形态特征。此外,还评估了六个语义分割算法从X光图像中提取股骨和胫骨骨头的效果。我们的分析发现,在疼痛加剧的情况下,股骨的形态发生显著变化。相反,疼痛的改善可能不会显著改变骨的形状。基于少数样本学习(UniverSeg)的算法在股骨和胫骨的Dice分数分别为99.69%和99.60%时表现出优异的分割结果。在疼痛状况分类方面,基于零样本学习(CP-SAM)的算法在所有模型中取得了最高准确率,达到66%。UniverSeg建议用于自动膝关节骨分割,而SAM模型则表明,通过针对提示编码器的修改,可以实现更好的分类结果。这些发现突出了少量样本学习在语义分割的有效性,以及零样本学习在增强膝关节骨性关节炎诊断分类模型方面的潜力。
https://arxiv.org/abs/2403.08761
The challenge of accessing historical patient data for clinical research, while adhering to privacy regulations, is a significant obstacle in medical science. An innovative approach to circumvent this issue involves utilising synthetic medical records that mirror real patient data without compromising individual privacy. The creation of these synthetic datasets, particularly without using actual patient data to train Large Language Models (LLMs), presents a novel solution as gaining access to sensitive patient information to train models is also a challenge. This study assesses the capability of the Llama 2 LLM to create synthetic medical records that accurately reflect real patient information, employing zero-shot and few-shot prompting strategies for comparison against fine-tuned methodologies that do require sensitive patient data during training. We focus on generating synthetic narratives for the History of Present Illness section, utilising data from the MIMIC-IV dataset for comparison. In this work introduce a novel prompting technique that leverages a chain-of-thought approach, enhancing the model's ability to generate more accurate and contextually relevant medical narratives without prior fine-tuning. Our findings suggest that this chain-of-thought prompted approach allows the zero-shot model to achieve results on par with those of fine-tuned models, based on Rouge metrics evaluation.
访问历史患者数据进行临床研究,同时遵守隐私规定,在医学科学中是一个重要的障碍。克服这一问题的创新方法是利用合成医学记录,不损害个人隐私地反映真实患者数据。这些合成数据集的创建,特别是没有使用实际患者数据来训练大型语言模型(LLMs),提出了一个新的解决方案,因为获得训练模型的敏感患者信息也是一个挑战。本研究评估了Llama 2 LLM创建合成医学记录准确反映真实患者信息的能力,利用零击和少击提示策略与需要训练过程中使用敏感患者数据的微调方法进行比较。我们关注于为历史疾病情况部分生成合成文本,利用MIMIC-IV数据集进行比较。本工作引入了一种新颖的提示技术,基于思维链的方法,提高了模型在没有事先微调的情况下生成更准确、更相关的医疗叙述的能力。我们的研究结果表明,这种思维链提示方法使得零击模型能够与微调模型的性能相匹敌,这是根据Rouge指标评估得出的。
https://arxiv.org/abs/2403.08664
Chinese Spell Checking (CSC) is a widely used technology, which plays a vital role in speech to text (STT) and optical character recognition (OCR). Most of the existing CSC approaches relying on BERT architecture achieve excellent performance. However, limited by the scale of the foundation model, BERT-based method does not work well in few-shot scenarios, showing certain limitations in practical applications. In this paper, we explore using an in-context learning method named RS-LLM (Rich Semantic based LLMs) to introduce large language models (LLMs) as the foundation model. Besides, we study the impact of introducing various Chinese rich semantic information in our framework. We found that by introducing a small number of specific Chinese rich semantic structures, LLMs achieve better performance than the BERT-based model on few-shot CSC task. Furthermore, we conduct experiments on multiple datasets, and the experimental results verified the superiority of our proposed framework.
中文 Spell Checking(CSC)是一种广泛使用的技术,对语音到文本(STT)和光学字符识别(OCR)起着关键作用。大多数现有的 CSC 方法都依赖 BERT 架构,实现出色的性能。然而,由于基础模型规模有限,基于 BERT 的方法在少样本场景下表现不佳,在实际应用中存在局限性。在本文中,我们探讨了使用名为 RS-LLM(基于丰富语义的 LLM)的上下文学习方法来引入大型语言模型(LLMs)作为基础模型。此外,我们研究了在我们的框架中引入各种中文丰富语义信息的影响。我们发现,通过引入少量特定的中文丰富语义结构,LLMs 在少样本 CSC 任务上比基于 BERT 的模型实现更好的性能。此外,我们对多个数据集进行了实验,并验证了我们提出框架的优越性。
https://arxiv.org/abs/2403.08492
Conventional wisdom suggests parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends to underperform on out-of-domain (OOD) tasks. In this paper, we introduce Sparse MetA-Tuning (SMAT), a method inspired by sparse mixture-of-experts approaches and trained to isolate subsets of pre-trained parameters automatically for meta-tuning on each task. SMAT successfully overcomes OOD sensitivity and delivers on the promise of enhancing the transfer abilities of vision foundation models beyond parameter-efficient finetuning. We establish new state-of-the-art results on a challenging combination of Meta-Dataset augmented with additional OOD tasks in both zero-shot and gradient-based adaptation settings. In addition, we provide a thorough analysis of the superiority of learned over hand-designed sparsity patterns for sparse expert methods and the pivotal importance of the sparsity level in balancing between in-domain and out-of-domain generalization. Our code is publicly available.
传统智慧建议将参数高效的预训练模型作为在视觉领域进行迁移学习的最先进方法,取代了包括元学习在内的丰富文献。在试图融合两者之长时,元调整引入了后续预训练模型的优化阶段,但迄今为止,只有在有限的成功案例中表现出局限性,而且往往在异质(OO)任务上表现欠佳。在本文中,我们引入了稀疏元调整(SMAT)方法,这是一种受到稀疏专家方法启发的技术,旨在为每个任务自动隔离预训练参数的子集。SMAT成功克服了OO敏感性,实现了在参数不效率优化之外的提高视觉基础模型转移能力的承诺。我们在零散数据集增强和基于梯度的迁移设置中,都取得了最新的结果。此外,我们深入分析了自学习过稀疏模式对于稀疏专家方法的优势,以及稀疏水平在领域内和领域外泛化平衡中的关键作用。我们的代码是公开可用的。
https://arxiv.org/abs/2403.08477
One of the ways Large Language Models (LLMs) are used to perform machine learning tasks is to provide them with a few examples before asking them to produce a prediction. This is a meta-learning process known as few-shot learning. In this paper, we use available Search-Based methods to optimise the number and combination of examples that can improve an LLM's estimation performance, when it is used to estimate story points for new agile tasks. Our preliminary results show that our SBSE technique improves the estimation performance of the LLM by 59.34% on average (in terms of mean absolute error of the estimation) over three datasets against a zero-shot setting.
大语言模型(LLMs)用于执行机器学习任务的一种方式是在要求它们做出预测之前,提供它们一些示例。这是一种被称为零样本学习(meta-learning)的元学习过程。在本文中,我们使用可用的基于搜索的方法来优化能够提高LLM对新敏捷任务估计性能的示例数量和组合数量。我们的初步结果表明,我们的基于搜索的元学习技术平均将LLM的估计性能提高了59.34%(以估计均方误差为基准)。在三个数据集上,我们的SBSE技术相比于零样本设置,将LLM的估计性能提高了59.34%。
https://arxiv.org/abs/2403.08430
Adverse drug-drug interactions~(DDIs) can compromise the effectiveness of concurrent drug administration, posing a significant challenge in healthcare. As the development of new drugs continues, the potential for unknown adverse effects resulting from DDIs becomes a growing concern. Traditional computational methods for DDI prediction may fail to capture interactions for new drugs due to the lack of knowledge. In this paper, we introduce a new problem setup as zero-shot DDI prediction that deals with the case of new drugs. Leveraging textual information from online databases like DrugBank and PubChem, we propose an innovative approach TextDDI with a language model-based DDI predictor and a reinforcement learning~(RL)-based information selector, enabling the selection of concise and pertinent text for accurate DDI prediction on new drugs. Empirical results show the benefits of the proposed approach on several settings including zero-shot and few-shot DDI prediction, and the selected texts are semantically relevant. Our code and data are available at \url{this https URL}.
翻译:不良药物-药物相互作用(DDIs)可能损害同时给药的有效性,给医疗保健带来巨大挑战。随着新药的研发继续推进,由于缺乏知识,DDIs导致未知的不良反应的可能性越来越大。传统计算方法在预测DDIs时可能无法捕捉到新药的相互作用,因为缺乏相关知识。在本文中,我们提出了一个新问题设置,称为零击DDI预测,用于处理新药的情况。利用在线数据库DrugBank和PubChem中的文本信息,我们提出了一个基于语言模型和RL信息选择器的创新方法TextDDI,使得可以在新药上准确预测DDIs。实证结果表明,该方法在包括零击和少击DDI预测的多个设置中具有优势,所选的文本在语义上是相关的。我们的代码和数据可在此处访问:\url{这个链接}。
https://arxiv.org/abs/2403.08377
Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, particularly in understanding image content. This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories, which holds considerable significance in scenarios with restricted data due to cost or privacy constraints. Directly fine-tuning VLMs for RS-FGSC often encounters the challenge of overfitting the seen classes, resulting in suboptimal generalization to unseen classes, which highlights the difficulty in differentiating complex backgrounds and capturing distinct ship features. To address these issues, we introduce a novel prompt tuning technique that employs a hierarchical, multi-granularity prompt design. Our approach integrates remote sensing ship priors through bias terms, learned from a small trainable network. This strategy enhances the model's generalization capabilities while improving its ability to discern intricate backgrounds and learn discriminative ship features. Furthermore, we contribute to the field by introducing a comprehensive dataset, FGSCM-52, significantly expanding existing datasets with more extensive data and detailed annotations for less common ship classes. Extensive experimental evaluations demonstrate the superiority of our proposed method over current state-of-the-art techniques. The source code will be made publicly available.
远红外(RS-FGSC)中的细粒度船舶分类 poses 显著的挑战,由于类别的相似性很高,并且标签数据有限,导致传统监督分类方法的效力受到限制。近年来,大型预训练视觉语言模型(VLMs)的进步在几 shot 或零 shot 学习中取得了令人印象深刻的成果,特别是在理解图像内容方面。本研究深入探讨了利用 VLMs 增强未见类别分类准确性,这对于由于成本或隐私限制而受限数据场景具有重大意义。直接对 VLMs 进行 RS-FGSC 往往会导致过拟合见类别的挑战,从而导致对未见类别的泛化表现不佳,凸显了在区分复杂背景和捕捉独特船舶特征方面存在的困难。为解决这些问题,我们引入了一种新颖的提示调整技术,采用分层多粒度提示设计。我们的方法通过来自小型可训练网络的偏差项集成遥感船舶先验。这种策略通过增强模型的泛化能力同时改善其区分复杂背景和学习具有区分性的船舶特征的能力而有益于该领域。此外,我们还通过引入全面的 FGSCM-52 数据集,大大扩展了现有数据集,并为较为罕见的船舶类别提供了更详细和全面的注释,对该领域做出了重要贡献。广泛的实验评估证明了我们所提出的方法在现有技术水平上具有优越性。源代码将公开发布。
https://arxiv.org/abs/2403.08271
Prompting methods play a crucial role in enhancing the capabilities of pre-trained large language models (LLMs). We explore how contrastive prompting (CP) significantly improves the ability of large language models to perform complex reasoning. We demonstrate that LLMs are decent contrastive reasoners by simply adding "Let's give a correct and a wrong answer." before LLMs provide answers. Experiments on two large language models show that zero-shot contrastive prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks without any hand-crafted few-shot examples, such as increasing the accuracy on GSM8K from 35.9% to 88.8% and AQUA-RAT from 41.3% to 62.2% with the state-of-the-art GPT-4 model. Our method not only surpasses zero-shot CoT and few-shot CoT in most arithmetic and commonsense reasoning tasks but also can seamlessly integrate with existing prompting methods, resulting in improved or comparable results when compared to state-of-the-art methods. Our code is available at this https URL
提示方法在增强预训练大型语言模型(LLMs)的功能方面发挥着关键作用。我们探讨了如何对比性提示(CP)显著提高大型语言模型进行复杂推理的能力。我们证明了LLMs是通过在LLMs提供答案之前添加“让我们提供一个正确答案和一个错误答案”来作为对比性提示。在两个大型语言模型的实验中,零散提示性改进了各种算术、常识和符号推理任务的表现,而没有任何手工定制的小样本示例。使用最先进的GPT-4模型,将GSM8K的准确率从35.9%提高至88.8%,将AQUA-RAT的准确率从41.3%提高至62.2%。我们的方法不仅在大多数算术和常识推理任务中超过了零散提示和少量提示的CoT,而且还可以无缝集成现有的提示方法,从而在比较最先进方法时实现改进或与最先进方法相当的结果。我们的代码可在此处访问:https://www.aclweb.org/anthology/N22-3102
https://arxiv.org/abs/2403.08211