Abstract
The potential of large language models (LLMs) to reason like humans has been a highly contested topic in Machine Learning communities. However, the reasoning abilities of humans are multifaceted and can be seen in various forms, including analogical, spatial and moral reasoning, among others. This fact raises the question whether LLMs can perform equally well across all these different domains. This research work aims to investigate the performance of LLMs on different reasoning tasks by conducting experiments that directly use or draw inspirations from existing datasets on analogical and spatial reasoning. Additionally, to evaluate the ability of LLMs to reason like human, their performance is evaluted on more open-ended, natural language questions. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks. I believe these experiments are crucial for informing the future development of LLMs, particularly in contexts that require diverse reasoning proficiencies. By shedding light on the reasoning abilities of LLMs, this study aims to push forward our understanding of how they can better emulate the cognitive abilities of humans.
Abstract (translated)
大型语言模型(LLM)像人类一样进行推理的潜在能力一直是机器学习社区中高度争议的话题。然而,人类的思维能力具有多方面的特点,可以表现在不同的形式中,包括类比、空间和行为推理等。这一事实引发了一个问题,即LLM是否能在所有不同的领域中表现同样出色。本研究旨在通过直接使用或借鉴现有的类比和空间推理数据集来开展实验,以研究LLM在不同推理任务中的表现。此外,为了评估LLM像人类一样推理的能力,我们对更加开放自然语言问题的表现进行了评估。我的研究结果表明,LLM在类比和道德推理方面表现优异,但在空间推理任务中表现不足。我相信这些实验对于LLM未来的发展前景至关重要,特别是在需要多种推理能力的场景下。通过深入研究LLM的推理能力,本研究旨在推动我们理解如何更好地模拟人类的认知能力。
URL
https://arxiv.org/abs/2303.12810