Abstract
Faithful evaluation of language model capabilities is crucial for deriving actionable insights that can inform model development. However, rigorous causal evaluations in this domain face significant methodological challenges, including complex confounding effects and prohibitive computational costs associated with extensive retraining. To tackle these challenges, we propose a causal representation learning framework wherein observed benchmark performance is modeled as a linear transformation of a few latent capability factors. Crucially, these latent factors are identified as causally interrelated after appropriately controlling for the base model as a common confounder. Applying this approach to a comprehensive dataset encompassing over 1500 models evaluated across six benchmarks from the Open LLM Leaderboard, we identify a concise three-node linear causal structure that reliably explains the observed performance variations. Further interpretation of this causal structure provides substantial scientific insights beyond simple numerical rankings: specifically, we reveal a clear causal direction starting from general problem-solving capabilities, advancing through instruction-following proficiency, and culminating in mathematical reasoning ability. Our results underscore the essential role of carefully controlling base model variations during evaluation, a step critical to accurately uncovering the underlying causal relationships among latent model capabilities.
Abstract (translated)
对语言模型能力进行忠实的评估对于获得能够指导模型开发的实际见解至关重要。然而,在这一领域中,严格的因果评价面临重大的方法论挑战,包括复杂的混杂效应和与广泛再训练相关的高昂计算成本。为了应对这些挑战,我们提出了一种因果表示学习框架,在该框架下,观察到的基准性能被建模为少数潜在能力因素的线性变换。关键在于,在适当控制作为公共混淆变量的基础模型之后,识别出这些潜在因素之间存在因果关系。 我们将这种方法应用于一个包含超过1500个模型的综合数据集上,这些模型在开放LLM排行榜上的六个基准测试中进行了评估。我们确定了一个简洁的三节点线性因果结构,该结构能够可靠地解释观察到的表现变化。进一步解读这种因果结构提供了超越简单数值排名的大量科学见解:具体来说,我们揭示了一种清晰的因果方向,从一般问题解决能力开始,通过指令遵循熟练度,最终达到数学推理能力。 我们的结果强调了在评估过程中仔细控制基础模型差异的重要作用,这是准确发现潜在模型能力之间因果关系的关键步骤。
URL
https://arxiv.org/abs/2506.10378