Paper Reading AI Learner

Evaluating the Ability of Large Language Models to Generate Verifiable Specifications in VeriFast

2024-11-04 17:44:11
Marilyn Rego, Wen Fan, Xin Hu, Sanya Dod, Zhaorui Ni, Danning Xie, Jenna DiVincenzo, Lin Tan

Abstract

Static verification is a powerful method for enhancing software quality, but it demands significant human labor and resources. This is particularly true of static verifiers that reason about heap manipulating programs using an ownership logic. LLMs have shown promise in a number of software engineering activities, including code generation, test generation, proof generation for theorem provers, and specification generation for static verifiers. However, prior work has not explored how well LLMs can perform specification generation for specifications based in an ownership logic, such as separation logic. To address this gap, this paper explores the effectiveness of large language models (LLMs), specifically OpenAI's GPT models, in generating fully correct specifications based on separation logic for static verification of human-written programs in VeriFast. Our first experiment employed traditional prompt engineering and the second used Chain-of-Thought (CoT) Prompting to identify and address common errors generated across the GPT models. The results indicate that GPT models can successfully generate specifications for verifying heap manipulating code with VeriFast. Furthermore, while CoT prompting significantly reduces syntax errors generated by the GPT models, it does not greatly improve verification error rates compared to prompt engineering.

Abstract (translated)

静态验证是一种强化软件质量的强大方法,但它需要大量的人力和资源。特别是对于那些使用所有权逻辑来推理堆操作程序的静态验证器而言更是如此。大型语言模型(LLMs)在许多软件工程活动中显示出了潜力,包括代码生成、测试生成、定理证明者的证明生成以及针对静态验证器的规范生成。然而,先前的研究尚未探讨LLMs在基于所有权逻辑(如分离逻辑)的规范生成方面的表现如何。为了解决这一空白,本文探索了大型语言模型(特别是OpenAI的GPT模型)在使用VeriFast进行人编写程序的静态验证时,生成完全正确的基于分离逻辑规格说明的有效性。我们的第一个实验采用了传统的提示工程方法,第二个实验则使用了链式思考(CoT)提示来识别和解决GPT模型生成中常见的错误。结果表明,GPT模型能够成功地为VeriFast验证堆操作代码生成规范。此外,尽管CoT提示显著减少了由GPT模型产生的语法错误,但它并没有显著提高与提示工程相比的验证错误率。

URL

https://arxiv.org/abs/2411.02318

PDF

https://arxiv.org/pdf/2411.02318.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot