Paper Reading AI Learner

Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

2023-05-24 19:59:51
Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, Faramarz Fekri

Abstract

Translating natural language sentences to first-order logic (NL-FOL translation) is a longstanding challenge in the NLP and formal logic literature. This paper introduces LogicLLaMA, a LLaMA-7B model fine-tuned for NL-FOL translation using LoRA on a single GPU. LogicLLaMA is capable of directly translating natural language into FOL rules, which outperforms GPT-3.5. LogicLLaMA is also equipped to correct FOL rules predicted by GPT-3.5, and can achieve similar performance as GPT-4 with a fraction of the cost. This correction ability was achieved by a novel supervised fine-tuning (SFT) + reinforcement learning with human feedback (RLHF) framework, which initially trains on synthetically perturbed NL-FOL pairs to encourage chain-of-thought reasoning and then fine-tunes with RLHF on GPT-3.5 outputs using a FOL verifier as the reward model. To train LogicLLaMA, we present MALLS (large language $\textbf{M}$odel gener$\textbf{A}$ted N$\textbf{L}$-FO$\textbf{L}$ pair$\textbf{S}$), a dataset of 34K high-quality and diverse sentence-level NL-FOL pairs collected from GPT-4. The dataset was created by implementing a pipeline that prompts GPT-4 for pairs, and dynamically adjusts the prompts to ensure the collection of pairs with rich and diverse contexts at different levels of complexity, and verifies the validity of the generated FOL rules. Codes, weights, and data are available at $\href{this https URL}{\small \text{this https URL}}$.

Abstract (translated)

将自然语言句子转换为第一级逻辑(NL-FOL translation)是NLP和形式逻辑文献中一个长期存在的挑战。本文介绍了逻辑LLaMA,一个LLaMA-7B模型通过单个GPU使用LoRA微调了NL-FOL translation。逻辑LLaMA能够直接翻译自然语言到FOL规则,比GPT-3.5表现更好。逻辑LLaMA也具备纠正GPT-3.5预测的FOL规则的能力,并且以GPT-4的成本 fraction 之一实现了与GPT-4相似的性能。纠正能力是通过一种新的监督微调(SFT) + 强化学习与人类反馈(RLHF)框架实现的,该框架最初从GPT-4合成的略有偏差的NL-FOL对开始训练,然后使用RLHF在GPT-3.5的输出上进行微调,使用FOL验证器作为奖励模型。为了训练逻辑LLaMA,我们提供了MALLS(大型语言模型生成nel-FOL pairs数据集),从GPT-4收集了34K高质量的、多样化的句子级别的NL-FOL对。数据集是通过实现一个程序流,提示GPT-4对,并动态调整 prompts 以确保从不同的复杂性级别中收集具有丰富和多样化的上下文的对,并验证生成的FOL规则的的有效性。代码、权重和数据可在$this https URL$上获取。

URL

https://arxiv.org/abs/2305.15541

PDF

https://arxiv.org/pdf/2305.15541.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot