Abstract
As machine learning (ML) gains widespread adoption, practitioners are increasingly seeking means to quantify and control the risk these systems incur. This challenge is especially salient when ML systems have autonomy to collect their own data, such as in black-box optimization and active learning, where their actions induce sequential feedback-loop shifts in the data distribution. Conformal prediction has emerged as a promising approach to uncertainty and risk quantification, but existing variants either fail to accommodate sequences of data-dependent shifts, or do not fully exploit the fact that agent-induced shift is under our control. In this work we prove that conformal prediction can theoretically be extended to \textit{any} joint data distribution, not just exchangeable or quasi-exchangeable ones, although it is exceedingly impractical to compute in the most general case. For practical applications, we outline a procedure for deriving specific conformal algorithms for any data distribution, and we use this procedure to derive tractable algorithms for a series of agent-induced covariate shifts. We evaluate the proposed algorithms empirically on synthetic black-box optimization and active learning tasks.
Abstract (translated)
随着机器学习(ML)的广泛采用,实践者 increasingly寻求衡量和控制这些系统所面临的风险的手段。当 ML 系统具有自主收集数据的能力时,这一挑战尤为突出,例如在黑盒优化和主动学习场景中,系统的行动会导致数据分布的序列反馈循环转移。同构预测作为一种有前景的不确定性和风险量化方法已经出现,但现有的变体要么无法适应数据依赖的序列变化,要么没有充分利用代理导致的转变是受我们控制的这一事实。在这项工作中,我们证明了同构预测可以理论上扩展到任何联合数据分布,而不仅仅是可交换或准可交换的 ones,虽然在最一般的情况下,计算它是极为困难的。为了实际应用,我们概述了一种为任何数据分布生成特定同构算法的程序,并利用这一程序为一系列代理引起的协变量转移生成可处理算法。我们在 synthetic 黑盒优化和主动学习任务上通过实验评估所提出的算法。
URL
https://arxiv.org/abs/2405.06627