Abstract
This paper presents an exhaustive quantitative and qualitative evaluation of Large Language Models (LLMs) for Knowledge Graph (KG) construction and reasoning. We employ eight distinct datasets that encompass aspects including entity, relation and event extraction, link prediction, and question answering. Empirically, our findings suggest that GPT-4 outperforms ChatGPT in the majority of tasks and even surpasses fine-tuned models in certain reasoning and question-answering datasets. Moreover, our investigation extends to the potential generalization ability of LLMs for information extraction, which culminates in the presentation of the Virtual Knowledge Extraction task and the development of the VINE dataset. Drawing on these empirical findings, we further propose AutoKG, a multi-agent-based approach employing LLMs for KG construction and reasoning, which aims to chart the future of this field and offer exciting opportunities for advancement. We anticipate that our research can provide invaluable insights for future undertakings of KG\footnote{Code and datasets will be available in this https URL.
Abstract (translated)
本论文全面评估了大型语言模型(LLM)用于知识图(KG)建设和推理的量化和定性评价。我们使用了8个不同的数据集,涵盖了实体、关系和事件提取、链接预测和问答等方面。经验表明,我们的研究结果表明GPT-4在大多数任务中比ChatGPT表现更好,甚至在一些推理和问答数据集上超过优化模型。此外,我们的研究还扩展到LLM的信息提取潜在 generalization能力,这最终导致了虚拟知识提取任务和VINE数据集的发布。基于这些经验成果,我们进一步提出了AutoKG,一种基于多代理的方法来使用LLM进行KG建设和推理,旨在预测该领域的未来并提供令人兴奋的进步机会。我们预计,我们的研究可以为KG领域的未来任务提供宝贵的见解。
URL
https://arxiv.org/abs/2305.13168