Paper Reading AI Learner

FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

2024-06-20 15:31:05
Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

Abstract

Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: this https URL.

Abstract (translated)

形式验证(FV)在当前新兴的程序合成技术中见证了日益增长的重要性,这些技术正在演变大型语言模型(LLMs)。然而,目前的正式验证主要依赖于符号验证器或人工规则,导致对于广泛和灵活验证的限制。另一方面,像Isabelle这样的自动定理证明形式语言,作为另一条严谨的验证路线,使用全面的规则和定理。在本文中,我们提出了FVEL,一个带有LLM的交互式形式验证环境。具体来说,FVEL将给定的代码转换为Isabelle,然后通过LLM进行定理证明。结合范式利用Isabelle中严谨而丰富的规则和组织,同时也便于引入和调整最新的LLM。为实现这一目标,我们提取了一个大规模FVELER3。FVELER数据集包括在Isabelle中表述的代码依赖关系和验证过程,包括758个定理、29,125个引理和200,646个证明步骤,以及深入的依赖关系。我们在FVEL环境中通过首先用FVELER微调LLM,然后对它们在Code2Inv和SV-COMP上进行评估,来对比FVEL和LLM的性能。结果显示,带有FVELER微调的Llama3-8B解决了17.39%(69 -> 81)的问题,Mistral-7B则解决了12%(75 -> 84)的问题,证明错误的比例降低了。项目页面:此链接。

URL

https://arxiv.org/abs/2406.14408

PDF

https://arxiv.org/pdf/2406.14408.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot