Towards Large Language Models as Copilots for Theorem Proving in Lean

Abstract
Abstract (translated)
URL
PDF

Abstract

Theorem proving is an important challenge for large language models (LLMs), as formal proofs can be checked rigorously by proof assistants such as Lean, leaving no room for hallucination. Existing LLM-based provers try to prove theorems in a fully autonomous mode without human intervention. In this mode, they struggle with novel and challenging theorems, for which human insights may be critical. In this paper, we explore LLMs as copilots that assist humans in proving theorems. We introduce Lean Copilot, a framework for running LLM inference in Lean. It enables programmers to build various LLM-based proof automation tools that integrate seamlessly into the workflow of Lean users. Using Lean Copilot, we build tools for suggesting proof steps (tactic suggestion), completing intermediate proof goals (proof search), and selecting relevant premises (premise selection) using LLMs. Users can use our pretrained models or bring their own ones that run either locally (with or without GPUs) or on the cloud. Experimental results demonstrate the effectiveness of our method in assisting humans and automating theorem proving process compared to existing rule-based proof automation in Lean. We open source all codes under a permissive MIT license to facilitate further research.

Abstract (translated)

证明是大型语言模型（LLMs）的一个重要挑战，因为形式化的证明可以通过诸如Lean这样的证明助手进行严格检查，从而排除了幻觉。现有的LLM基证明者试图以完全自主的方式证明定理，而无需人类干预。在这种模式下，他们努力应对新颖且具有挑战性的定理，这些定理可能需要人类洞察力才能理解。在本文中，我们将LLM视为辅助人类证明定理的 copilots。我们引入了Lean Copilot，一个在Lean中运行LLM推理的框架。它使程序员能够构建各种LLM基证明自动化工具，这些工具可以无缝地融入Lean用户的 workflow中。使用Lean Copilot，我们为用户提供了建议证明步骤（策略建议）、完成中间证明目标（证明搜索）和选择相关前提（前提选择）使用LLM的功能。用户可以使用我们的预训练模型，或者根据自己的需要使用本地（使用或不需要GPU）或云计算的现有模型。实验结果表明，与其他基于规则的证明自动化方法相比，我们的方法有效地协助了人类和自动化了定理证明过程。我们开源了所有代码，并使用宽松的MIT许可证促进进一步研究。

URL

https://arxiv.org/abs/2404.12534

PDF

https://arxiv.org/pdf/2404.12534.pdf

Towards Large Language Models as Copilots for Theorem Proving in Lean

Abstract

Abstract (translated)

URL

PDF Copy

PDF