Abstract
In-context learning (ICL) is the ability of a large language model (LLM) to learn a new task from a few demonstrations presented as part of the context. Past studies have attributed a large portion of the success of ICL to the way these in-context demonstrations are represented, particularly to how labels are represented in classification tasks. On the other hand, observations of the learning capacity of ICL (i.e., the extent to which more in-context demonstrations can lead to higher performance) have been mixed, and ICL is often thought to occur only under specific conditions. The interaction between these two aspects in ICL, representation and learning, has not been studied in depth until now. We hypothesize that they are largely independent of one another, such that the representation of demonstrations determines the baseline accuracy of ICL, while learning from additional demonstrations improves only on top of this baseline. We validate this hypothesis by developing an optimization algorithm that can enumerate a spectrum of possible label sets (representations) varying in semantic relevance. We then perform ICL with varying numbers of in-context demonstrations for each of these label sets. We observed that learning happens regardless of the quality of the label set itself, although its efficiency, measured by the slope of improvement over in-context demonstrations, is conditioned on both the label set quality and the parameter count of the underlying language model. Despite the emergence of learning, the relative quality (accuracy) of the choice of a label set (representation) is largely maintained throughout learning, confirming our hypothesis and implying their orthogonality. Our work reveals a previously underexplored aspect of ICL: the independent effects of learning from demonstrations and their representations on ICL performance.
Abstract (translated)
在上下文学习(ICL)中,大型语言模型(LLM)能够从作为背景信息的一部分呈现的几个演示中学到新任务。以往的研究认为,ICL的成功很大程度上归因于这些上下文演示表示的方式,特别是在分类任务中标签的表示方式。另一方面,对ICL学习能力(即更多的上下文演示如何导致更高的性能)的观察结果参差不齐,并且人们通常认为ICL仅在特定条件下发生。至今为止,关于ICL中表示和学习这两个方面的相互作用尚未深入研究。我们假设它们彼此之间基本独立:演示的表示决定了ICL的基础准确率,而从额外的演示中学到的知识仅仅在此基础上进行改进。 为了验证这一假设,我们开发了一种优化算法,可以枚举一系列可能的标签集(即表示),这些标签集在语义相关性方面各不相同。然后,对于每一个这样的标签集,我们在不同数量的上下文演示下执行ICL。我们观察到,在任何给定的标签集质量本身的情况下,学习都会发生,尽管其效率——以每增加上下文演示改进的斜率来衡量——取决于标签集质量和底层语言模型参数的数量。 尽管出现了学习的现象,但选择一个特定标签集(即表示)的质量在整个学习过程中基本保持不变。这证实了我们的假设,并暗示它们在很大程度上是正交的。我们的工作揭示了一个以前被忽视的ICL方面:从演示中学习和其表示对ICL性能独立影响的作用。
URL
https://arxiv.org/abs/2510.08372