Paper Reading AI Learner

Automated Construction of Theme-specific Knowledge Graphs

2024-04-29 23:14:14
Linyi Ding, Sizhe Zhou, Jinfeng Xiao, Jiawei Han

Abstract

Despite widespread applications of knowledge graphs (KGs) in various tasks such as question answering and intelligent conversational systems, existing KGs face two major challenges: information granularity and deficiency in timeliness. These hinder considerably the retrieval and analysis of in-context, fine-grained, and up-to-date knowledge from KGs, particularly in highly specialized themes (e.g., specialized scientific research) and rapidly evolving contexts (e.g., breaking news or disaster tracking). To tackle such challenges, we propose a theme-specific knowledge graph (i.e., ThemeKG), a KG constructed from a theme-specific corpus, and design an unsupervised framework for ThemeKG construction (named TKGCon). The framework takes raw theme-specific corpus and generates a high-quality KG that includes salient entities and relations under the theme. Specifically, we start with an entity ontology of the theme from Wikipedia, based on which we then generate candidate relations by Large Language Models (LLMs) to construct a relation ontology. To parse the documents from the theme corpus, we first map the extracted entity pairs to the ontology and retrieve the candidate relations. Finally, we incorporate the context and ontology to consolidate the relations for entity pairs. We observe that directly prompting GPT-4 for theme-specific KG leads to inaccurate entities (such as "two main types" as one entity in the query result) and unclear (such as "is", "has") or wrong relations (such as "have due to", "to start"). In contrast, by constructing the theme-specific KG step by step, our model outperforms GPT-4 and could consistently identify accurate entities and relations. Experimental results also show that our framework excels in evaluations compared with various KG construction baselines.

Abstract (translated)

尽管知识图谱(KGs)在各种任务中的广泛应用,如问答和智能对话系统,现有KG面临两个主要挑战:信息粒度和时间不足。这些阻碍了从KGs中检索和分析上下文、细粒度和最新知识的能力,特别是在高度专业化的主题(例如,专业科学研究)和快速变化的环境(例如,新闻或灾害跟踪)中。为了应对这些挑战,我们提出了一个主题特定知识图(即 ThemeKG),一个基于主题特定语料库的知识图谱,并设计了用于 ThemeKG 构建的无监督框架(名为 TKGCon)。该框架从主题特定语料库中提取原始主题,然后通过大型语言模型(LLMs)生成候选关系,构建主题关系本体。为了解析主题语料库中的文档,我们首先将提取到的实体对映射到语料库,并检索候选关系。最后,我们将上下文和本体整合用于关系匹配。我们观察到,直接使用 GPT-4 生成主题特定 KG会导致不准确实体(例如查询结果中的“两个主要类型”作为一个实体),以及不清晰或错误的關係(例如“由於”或“开始于”)。相比之下,通过逐步构建主题特定 KG,我们的模型在比较各种 KG 建设基线方面表现出优异性能。实验结果还显示,我们的框架在各种 KG 建设基线上的评估中表现出色。

URL

https://arxiv.org/abs/2404.19146

PDF

https://arxiv.org/pdf/2404.19146.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot