Abstract
Recent research shows that in-context learning (ICL) can be effective even when demonstrations have missing or incorrect labels. To shed light on this capability, we examine a canonical setting where the demonstrations are drawn according to a binary Gaussian mixture model (GMM) and a certain fraction of the demonstrations have missing labels. We provide a comprehensive theoretical study to show that: (1) The loss landscape of one-layer linear attention models recover the optimal fully-supervised estimator but completely fail to exploit unlabeled data; (2) In contrast, multilayer or looped transformers can effectively leverage unlabeled data by implicitly constructing estimators of the form $\sum_{i\ge 0} a_i (X^\top X)^iX^\top y$ with $X$ and $y$ denoting features and partially-observed labels (with missing entries set to zero). We characterize the class of polynomials that can be expressed as a function of depth and draw connections to Expectation Maximization, an iterative pseudo-labeling algorithm commonly used in semi-supervised learning. Importantly, the leading polynomial power is exponential in depth, so mild amount of depth/looping suffices. As an application of theory, we propose looping off-the-shelf tabular foundation models to enhance their semi-supervision capabilities. Extensive evaluations on real-world datasets show that our method significantly improves the semisupervised tabular learning performance over the standard single pass inference.
Abstract (translated)
最近的研究表明,在上下文学习(ICL)中,即使演示中的标签缺失或错误,也能取得有效的成果。为了揭示这一能力,我们考察了一个典型场景:在这种场景下,演示数据根据二元高斯混合模型(GMM)抽取,并且其中一部分演示数据的标签是缺失的。我们提供了一项全面的理论研究,表明: 1. 一层线性注意力模型的学习损失地形能够恢复最优完全监督估计器,但完全无法利用未标记的数据。 2. 相比之下,多层或循环Transformer可以通过隐式构建形式为$\sum_{i\ge 0} a_i (X^\top X)^iX^\top y$的估计器来有效利用未标记数据(其中$X$和$y$分别表示特征和部分观察到的标签,缺失项设为零)。 我们描述了可以作为深度函数表达的一类多项式,并且将其与期望最大化算法联系起来——这是一种在半监督学习中常用的迭代伪标注算法。值得注意的是,主导的多项式的最高幂次是指数级的,因此只需轻微增加深度或循环次数就足以实现这一效果。 作为理论应用的一部分,我们建议对现成的表格基础模型进行循环处理,以增强它们的半监督能力。在真实世界数据集上的广泛评估表明,与标准单遍推断相比,我们的方法显著提高了半监督表格学习性能。
URL
https://arxiv.org/abs/2506.15329