Paper Reading AI Learner

Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models

2025-06-13 06:25:59
Chenrui Cao, Liangcheng Song, Zenan Li, Xinyi Le, Xian Zhang, Hui Xue, Fan Yang

Abstract

Recent advancements, such as DeepSeek-Prover-V2-671B and Kimina-Prover-Preview-72B, demonstrate a prevailing trend in leveraging reinforcement learning (RL)-based large-scale training for automated theorem proving. Surprisingly, we discover that even without any training, careful neuro-symbolic coordination of existing off-the-shelf reasoning models and tactic step provers can achieve comparable performance. This paper introduces \textbf{DSP+}, an improved version of the Draft, Sketch, and Prove framework, featuring a \emph{fine-grained and integrated} neuro-symbolic enhancement for each phase: (1) In the draft phase, we prompt reasoning models to generate concise natural-language subgoals to benefit the sketch phase, removing thinking tokens and references to human-written proofs; (2) In the sketch phase, subgoals are autoformalized with hypotheses to benefit the proving phase, and sketch lines containing syntactic errors are masked according to predefined rules; (3) In the proving phase, we tightly integrate symbolic search methods like Aesop with step provers to establish proofs for the sketch subgoals. Experimental results show that, without any additional model training or fine-tuning, DSP+ solves 80.7\%, 32.8\%, and 24 out of 644 problems from miniF2F, ProofNet, and PutnamBench, respectively, while requiring fewer budgets compared to state-of-the-arts. DSP+ proves \texttt{imo\_2019\_p1}, an IMO problem in miniF2F that is not solved by any prior work. Additionally, DSP+ generates proof patterns comprehensible by human experts, facilitating the identification of formalization errors; For example, eight wrongly formalized statements in miniF2F are discovered. Our results highlight the potential of classical reasoning patterns besides the RL-based training. All components will be open-sourced.

Abstract (translated)

最近的进步,如DeepSeek-Prover-V2-671B和Kimina-Prover-Preview-72B,展示了利用基于强化学习(RL)的大规模训练进行自动化定理证明的趋势。令人惊讶的是,我们发现即使没有任何训练,精心协调现有的现成推理模型与步骤证明器也可以达到相当的性能水平。本文介绍了一种改进版的Draft、Sketch和Prove框架——**DSP+**,该版本在每个阶段都采用了**细粒度且集成化**的神经符号增强:(1)在草稿阶段,我们提示推理模型生成简洁的自然语言子目标以帮助草图阶段,并移除思考标记及对人类编写证明的引用;(2)在草图阶段,子目标与假设一起自动形式化,以便为证明阶段做好准备,并根据预定义规则屏蔽包含句法错误的草图行;(3)在证明阶段,我们将符号搜索方法如Aesop紧密集成到步骤证明器中,以建立草图子目标的证明。 实验结果表明,在不进行额外模型训练或微调的情况下,DSP+解决了miniF2F、ProofNet和PutnamBench中的644个问题中的80.7%、32.8%和24个问题,并且相比现有最佳方法要求更低的成本预算。特别地,DSP+证明了**imo_2019_p1**这一miniF2F中未被任何先前工作解决的IMO(国际数学奥林匹克)问题。此外,DSP+生成了人类专家可以理解的证明模式,有助于识别形式化错误;例如,在miniF2F中发现了八个错误的形式化陈述。 我们的研究结果强调除了基于RL训练之外,经典推理模式也具有巨大潜力。所有组件将开源提供。

URL

https://arxiv.org/abs/2506.11487

PDF

https://arxiv.org/pdf/2506.11487.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot