Abstract
Large reasoning models (LRMs) like OpenAI-o1 have demonstrated impressive long stepwise reasoning capabilities through large-scale reinforcement learning. However, their extended reasoning processes often suffer from knowledge insufficiency, leading to frequent uncertainties and potential errors. To address this limitation, we introduce \textbf{Search-o1}, a framework that enhances LRMs with an agentic retrieval-augmented generation (RAG) mechanism and a Reason-in-Documents module for refining retrieved documents. Search-o1 integrates an agentic search workflow into the reasoning process, enabling dynamic retrieval of external knowledge when LRMs encounter uncertain knowledge points. Additionally, due to the verbose nature of retrieved documents, we design a separate Reason-in-Documents module to deeply analyze the retrieved information before injecting it into the reasoning chain, minimizing noise and preserving coherent reasoning flow. Extensive experiments on complex reasoning tasks in science, mathematics, and coding, as well as six open-domain QA benchmarks, demonstrate the strong performance of Search-o1. This approach enhances the trustworthiness and applicability of LRMs in complex reasoning tasks, paving the way for more reliable and versatile intelligent systems. The code is available at \url{this https URL}.
Abstract (translated)
大型推理模型(LRM)如OpenAI-o1 通过大规模强化学习展示了令人印象深刻的长步骤推断能力。然而,这些模型的扩展推理过程常常因知识不足而遭受频繁的不确定性及潜在错误。为解决这一限制,我们引入了**Search-o1**框架,该框架通过代理检索增强生成(RAG)机制和Reason-in-Documents模块来改进LRM。Search-o1 将一个代理搜索工作流程整合到推理过程中,在LRM遇到不确定的知识点时能够动态地检索外部知识。此外,由于检索文档的冗长性,我们设计了一个独立的 Reason-in-Documents 模块,在将信息注入推理链之前对检索的信息进行深度分析,从而最大限度地减少噪音并保持连贯的推理流程。在科学、数学和编码领域的复杂推断任务以及六个开放领域 QA 基准上的广泛实验显示了 Search-o1 强大的性能表现。这种方法增强了 LRMs 在复杂推理任务中的可信度及适用性,为更加可靠且多功能的智能系统铺平道路。代码可在 [此链接](this https URL) 获取。
URL
https://arxiv.org/abs/2501.05366