Harnessing the Power of Large Language Models for Natural Language to First-Order Logic Translation

Abstract
Abstract (translated)
URL
PDF

Abstract

Translating natural language sentences to first-order logic (NL-FOL translation) is a longstanding challenge in the NLP and formal logic literature. This paper introduces LogicLLaMA, a LLaMA-7B model fine-tuned for NL-FOL translation using LoRA on a single GPU. LogicLLaMA is capable of directly translating natural language into FOL rules, which outperforms GPT-3.5. LogicLLaMA is also equipped to correct FOL rules predicted by GPT-3.5, and can achieve similar performance as GPT-4 with a fraction of the cost. This correction ability was achieved by a novel supervised fine-tuning (SFT) + reinforcement learning with human feedback (RLHF) framework, which initially trains on synthetically perturbed NL-FOL pairs to encourage chain-of-thought reasoning and then fine-tunes with RLHF on GPT-3.5 outputs using a FOL verifier as the reward model. To train LogicLLaMA, we present MALLS (large language $\textbf{M}$odel gener$\textbf{A}$ted N$\textbf{L}$-FO$\textbf{L}$ pair$\textbf{S}$), a dataset of 34K high-quality and diverse sentence-level NL-FOL pairs collected from GPT-4. The dataset was created by implementing a pipeline that prompts GPT-4 for pairs, and dynamically adjusts the prompts to ensure the collection of pairs with rich and diverse contexts at different levels of complexity, and verifies the validity of the generated FOL rules. Codes, weights, and data are available at $\href{this https URL}{\small \text{this https URL}}$.

Abstract (translated)

将自然语言句子转换为第一级逻辑(NL-FOL translation)是NLP和形式逻辑文献中一个长期存在的挑战。本文介绍了逻辑LLaMA，一个LLaMA-7B模型通过单个GPU使用LoRA微调了NL-FOL translation。逻辑LLaMA能够直接翻译自然语言到FOL规则，比GPT-3.5表现更好。逻辑LLaMA也具备纠正GPT-3.5预测的FOL规则的能力，并且以GPT-4的成本 fraction 之一实现了与GPT-4相似的性能。纠正能力是通过一种新的监督微调(SFT) + 强化学习与人类反馈(RLHF)框架实现的，该框架最初从GPT-4合成的略有偏差的NL-FOL对开始训练，然后使用RLHF在GPT-3.5的输出上进行微调，使用FOL验证器作为奖励模型。为了训练逻辑LLaMA，我们提供了MALLS(大型语言模型生成nel-FOL pairs数据集)，从GPT-4收集了34K高质量的、多样化的句子级别的NL-FOL对。数据集是通过实现一个程序流，提示GPT-4对，并动态调整 prompts 以确保从不同的复杂性级别中收集具有丰富和多样化的上下文的对，并验证生成的FOL规则的的有效性。代码、权重和数据可在$this https URL$上获取。

URL

https://arxiv.org/abs/2305.15541

PDF

https://arxiv.org/pdf/2305.15541.pdf