Abstract
Large Language Models (LLMs) have demonstrated impressive capabilities across various tasks. However, LLMs often struggle with spatial reasoning which is one essential part of reasoning and inference and requires understanding complex relationships between objects in space. This paper proposes a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities. We evaluate our approach on two benchmark datasets: StepGame and SparQA, implementing three distinct strategies: (1) ASP (Answer Set Programming)-based symbolic reasoning, (2) LLM + ASP pipeline using DSPy, and (3) Fact + Logical rules. Our experiments demonstrate significant improvements over the baseline prompting methods, with accuracy increases of 40-50% on StepGame} dataset and 3-13% on the more complex SparQA dataset. The "LLM + ASP" pipeline achieves particularly strong results on the tasks of Finding Relations (FR) and Finding Block (FB) questions, though performance varies across different question types. The impressive results suggest that while neural-symbolic approaches offer promising directions for enhancing spatial reasoning in LLMs, their effectiveness depends heavily on the specific task characteristics and implementation strategies. We propose an integrated, simple yet effective set of strategies using a neural-symbolic pipeline to boost spatial reasoning abilities in LLMs. This pipeline and its strategies demonstrate strong and broader applicability to other reasoning domains in LLMs, such as temporal reasoning, deductive inference etc.
Abstract (translated)
大型语言模型(LLMs)在各种任务中展现了令人印象深刻的能力,但它们往往在空间推理方面遇到困难。空间推理是推理和推断的一个重要部分,需要理解对象之间的复杂空间关系。本文提出了一种新颖的神经符号框架,旨在增强LLMs的空间推理能力。我们使用两个基准数据集StepGame和SparQA对我们的方法进行了评估,并实施了三种不同的策略:(1)基于答案集合编程(ASP)的符号推理,(2) 使用DSPy的“LLM + ASP”管道,以及(3)事实+逻辑规则。实验结果显示,与基线提示方法相比,我们的方法在StepGame数据集上的准确率提高了40-50%,而在更复杂的SparQA数据集上则提高了3-13%。“LLM + ASP”管道在“寻找关系(FR)”和“寻找方块(FB)”问题任务中表现出特别出色的结果,尽管不同问题类型的性能有所变化。这些显著结果表明,虽然神经符号方法为增强LLMs的空间推理提供了有前景的方向,但它们的有效性高度依赖于具体任务的特征及实施策略。我们提出了一套综合、简单且有效的策略,通过神经符号管道来提升LLMs的空间推理能力。此管道及其策略展示了在其他LLM推理领域(如时间推理和演绎推断等)中的强大应用潜力。
URL
https://arxiv.org/abs/2411.18564