Paper Reading AI Learner

Learning to Discover at Test Time

2026-01-22 18:24:00
Mert Yuksekgonul, Daniel Koceja, Xinhao Li, Federico Bianchi, Jed McCaleb, Xiaolong Wang, Jan Kautz, Yejin Choi, James Zou, Carlos Guestrin, Yu Sun

Abstract

How can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement learning at test time, so the LLM can continue to train, but now with experience specific to the test problem. This form of continual learning is quite special, because its goal is to produce one great solution rather than many good ones on average, and to solve this very problem rather than generalize to other problems. Therefore, our learning objective and search subroutine are designed to prioritize the most promising solutions. We call this method Test-Time Training to Discover (TTT-Discover). Following prior work, we focus on problems with continuous rewards. We report results for every problem we attempted, across mathematics, GPU kernel engineering, algorithm design, and biology. TTT-Discover sets the new state of the art in almost all of them: (i) Erdős' minimum overlap problem and an autocorrelation inequality; (ii) a GPUMode kernel competition (up to $2\times$ faster than prior art); (iii) past AtCoder algorithm competitions; and (iv) denoising problem in single-cell analysis. Our solutions are reviewed by experts or the organizers. All our results are achieved with an open model, OpenAI gpt-oss-120b, and can be reproduced with our publicly available code, in contrast to previous best results that required closed frontier models. Our test-time training runs are performed using Tinker, an API by Thinking Machines, with a cost of only a few hundred dollars per problem.

Abstract (translated)

如何利用人工智能来发现某一科学问题的新前沿状态?先前的工作,如测试时间缩放中的AlphaEvolve,通过提示一个冻结的大型语言模型(LLM)来进行搜索。而我们则在测试期间执行强化学习,使得LLM能够继续训练,并且现在可以使用与特定测试问题相关的经验进行训练。这种持续学习方式非常特别,因为它旨在生成一个优秀的解决方案而非众多较好的平均方案,并且目标是解决这个问题而不是泛化到其他问题上。因此,我们的学习目标和搜索子程序被设计为优先考虑最有前途的解决方案。我们称这种方法为“测试时间训练以发现”(TTT-Discover)。 借鉴先前的研究成果,我们将重点放在具有连续奖励的问题上。我们在数学、GPU内核工程、算法设计及生物学等领域的所有尝试问题中报告了结果。在几乎所有的领域,TTT-Discover都设定了新的前沿状态: (i) ErdÅ¡os的最小重叠问题和一个自相关不等式; (ii) GPU模式内核竞赛(速度比之前的最佳实践快最多2倍); (iii) 过去的AtCoder算法比赛;以及 (iv) 单细胞分析中的去噪问题。 我们的解决方案由专家或组织者评审。我们所有的结果都是通过使用开放模型OpenAI gpt-oss-120b实现的,并可以通过公开提供的代码重现,而不同于以前的最佳成果需要封闭式前沿模型来完成。我们的测试时间训练运行使用了Thinking Machines的一个API——Tinker,每个问题的成本仅为几百美元。

URL

https://arxiv.org/abs/2601.16175

PDF

https://arxiv.org/pdf/2601.16175.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot