Abstract
How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) ErdÅs' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to $2\times$ faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.
Abstract (translated)
如何利用人工智能来发现某一科学问题的新前沿状态?先前的工作,如测试时间缩放中的AlphaEvolve,通过提示一个冻结的大型语言模型(LLM)来进行搜索。而我们则在测试期间执行强化学习,使得LLM能够继续训练,并且现在可以使用与特定测试问题相关的经验进行训练。这种持续学习方式非常特别,因为它旨在生成一个优秀的解决方案而非众多较好的平均方案,并且目标是解决这个问题而不是泛化到其他问题上。因此,我们的学习目标和搜索子程序被设计为优先考虑最有前途的解决方案。我们称这种方法为“测试时间训练以发现”(TTT-Discover)。 借鉴先前的研究成果,我们将重点放在具有连续奖励的问题上。我们在数学、GPU内核工程、算法设计及生物学等领域的所有尝试问题中报告了结果。在几乎所有的领域,TTT-Discover都设定了新的前沿状态: (i) ErdÅ¡os的最小重叠问题和一个自相关不等式; (ii) GPU模式内核竞赛(速度比之前的最佳实践快最多2倍); (iii) 过去的AtCoder算法比赛;以及 (iv) 单细胞分析中的去噪问题。 我们的解决方案由专家或组织者评审。我们所有的结果都是通过使用开放模型OpenAI gpt-oss-120b实现的,并可以通过公开提供的代码重现,而不同于以前的最佳成果需要封闭式前沿模型来完成。我们的测试时间训练运行使用了Thinking Machines的一个API——Tinker,每个问题的成本仅为几百美元。
URL
https://arxiv.org/abs/2601.16175