Abstract
A graph is a fundamental data model to represent various entities and their complex relationships in society and nature, such as social networks, transportation networks, financial networks, and biomedical systems. Recently, large language models (LLMs) have showcased a strong generalization ability to handle various NLP and multi-mode tasks to answer users' arbitrary questions and specific-domain content generation. Compared with graph learning models, LLMs enjoy superior advantages in addressing the challenges of generalizing graph tasks by eliminating the need for training graph learning models and reducing the cost of manual annotation. In this survey, we conduct a comprehensive investigation of existing LLM studies on graph data, which summarizes the relevant graph analytics tasks solved by advanced LLM models and points out the existing remaining challenges and future directions. Specifically, we study the key problems of LLM-based generative graph analytics (LLM-GGA) with three categories: LLM-based graph query processing (LLM-GQP), LLM-based graph inference and learning (LLM-GIL), and graph-LLM-based applications. LLM-GQP focuses on an integration of graph analytics techniques and LLM prompts, including graph understanding and knowledge graph (KG) based augmented retrieval, while LLM-GIL focuses on learning and reasoning over graphs, including graph learning, graph-formed reasoning and graph representation. We summarize the useful prompts incorporated into LLM to handle different graph downstream tasks. Moreover, we give a summary of LLM model evaluation, benchmark datasets/tasks, and a deep pro and cons analysis of LLM models. We also explore open problems and future directions in this exciting interdisciplinary research area of LLMs and graph analytics.
Abstract (translated)
图形是一种基本的数据模型,用于表示社会和自然中各种实体及其复杂的关系,如社交网络、交通网络、金融网络和生物医学系统。近年来,大型语言模型(LLMs)在处理各种自然语言处理(NLP)和多模态任务方面表现出强大的泛化能力,从而回答用户的任意问题和特定领域内容生成。与图形学习模型相比,LLMs在解决图形任务的挑战方面具有优越的优势,通过消除训练图形学习模型的需求并降低手动注释的成本。在本次调查中,我们对LLM关于图形数据的现有研究进行全面调查,概述了高级LLM模型解决的相关图形分析任务,并指出了现有的剩余挑战和未来发展方向。具体来说,我们研究了基于LLM的生成图数据分析(LLM-GGA)的三个主要问题:LLM-基于图查询处理(LLM-GQP)、LLM-基于图推理和学习(LLM-GIL)和基于图形-LLM的应用。LLM-GQP关注将图形数据分析技术和LLM提示进行集成,包括基于图理解和知识图(KG)的增强检索,而LLM-GIL关注在图形上进行学习和推理,包括图形学习、图形形成推理和图形表示。我们总结了LLM中纳入不同图形下游任务的有用提示。此外,我们还对LLM模型评估、基准数据集/任务以及LLM模型的优缺点进行了总结。此外,我们在LLM和图数据分析这一激动人心的跨学科研究领域中进行了探索。
URL
https://arxiv.org/abs/2404.14809