Paper Reading AI Learner

Towards Large Language Models as Copilots for Theorem Proving in Lean

2024-04-18 22:54:08
Peiyang Song, Kaiyu Yang, Anima Anandkumar

Abstract

Theorem proving is an important challenge for large language models (LLMs), as formal proofs can be checked rigorously by proof assistants such as Lean, leaving no room for hallucination. Existing LLM-based provers try to prove theorems in a fully autonomous mode without human intervention. In this mode, they struggle with novel and challenging theorems, for which human insights may be critical. In this paper, we explore LLMs as copilots that assist humans in proving theorems. We introduce Lean Copilot, a framework for running LLM inference in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users. Using Lean Copilot, we build tools for suggesting proof steps (tactic suggestion), completing intermediate proof goals (proof search), and selecting relevant premises (premise selection) using LLMs. Users can use our pretrained models or bring their own ones that run either locally (with or without GPUs) or on the cloud. Experimental results demonstrate the effectiveness of our method in assisting humans and automating theorem proving process compared to existing rule-based proof automation in Lean. We open source all codes under a permissive MIT license to facilitate further research.

Abstract (translated)

证明是大型语言模型(LLMs)的一个重要挑战,因为形式化的证明可以通过诸如Lean这样的证明助手进行严格检查,从而排除了幻觉。现有的LLM基证明者试图以完全自主的方式证明定理,而无需人类干预。在这种模式下,他们努力应对新颖且具有挑战性的定理,这些定理可能需要人类洞察力才能理解。在本文中,我们将LLM视为辅助人类证明定理的 copilots。我们引入了Lean Copilot,一个在Lean中运行LLM推理的框架。它使程序员能够构建各种LLM基证明自动化工具,这些工具可以无缝地融入Lean用户的 workflow中。使用Lean Copilot,我们为用户提供了建议证明步骤(策略建议)、完成中间证明目标(证明搜索)和选择相关前提(前提选择)使用LLM的功能。用户可以使用我们的预训练模型,或者根据自己的需要使用本地(使用或不需要GPU)或云计算的现有模型。实验结果表明,与其他基于规则的证明自动化方法相比,我们的方法有效地协助了人类和自动化了定理证明过程。我们开源了所有代码,并使用宽松的MIT许可证促进进一步研究。

URL

https://arxiv.org/abs/2404.12534

PDF

https://arxiv.org/pdf/2404.12534.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot