Paper Reading AI Learner

Post-Training Language Models for Continual Relation Extraction

2025-04-07 16:01:22
Sefika Efeoglu, Adrian Paschke, Sonja Schimmler

Abstract

Real-world data, such as news articles, social media posts, and chatbot conversations, is inherently dynamic and non-stationary, presenting significant challenges for constructing real-time structured representations through knowledge graphs (KGs). Relation Extraction (RE), a fundamental component of KG creation, often struggles to adapt to evolving data when traditional models rely on static, outdated datasets. Continual Relation Extraction (CRE) methods tackle this issue by incrementally learning new relations while preserving previously acquired knowledge. This study investigates the application of pre-trained language models (PLMs), specifically large language models (LLMs), to CRE, with a focus on leveraging memory replay to address catastrophic forgetting. We evaluate decoder-only models (eg, Mistral-7B and Llama2-7B) and encoder-decoder models (eg, Flan-T5 Base) on the TACRED and FewRel datasets. Task-incremental fine-tuning of LLMs demonstrates superior performance over earlier approaches using encoder-only models like BERT on TACRED, excelling in seen-task accuracy and overall performance (measured by whole and average accuracy), particularly with the Mistral and Flan-T5 models. Results on FewRel are similarly promising, achieving second place in whole and average accuracy metrics. This work underscores critical factors in knowledge transfer, language model architecture, and KG completeness, advancing CRE with LLMs and memory replay for dynamic, real-time relation extraction.

Abstract (translated)

现实世界中的数据,如新闻文章、社交媒体帖子和聊天机器人对话,具有动态性和非平稳性,这给通过知识图谱(KG)构建实时结构化表示带来了重大挑战。关系抽取(RE),作为KG创建的一个基本组成部分,在处理不断变化的数据时往往会遇到困难,因为传统的模型依赖于静态且过时的数据集。持续关系提取(CRE)方法通过逐步学习新的关系并保留之前获取的知识来解决这一问题。 本研究探讨了预训练语言模型(PLM),特别是大型语言模型(LLMs),在CRE中的应用,并重点关注利用内存回放技术解决灾难性遗忘的问题。我们在TACRED和FewRel数据集上评估了解码器单独的模型(如Mistral-7B和Llama2-7B)以及编码器解码器的模型(如Flan-T5 Base)。任务增量微调LLMs在处理TACRED数据时表现出优于早期使用仅编码器模型(如BERT)的方法,尤其是在已见过的任务准确性和整体性能方面(通过总准确率和平均准确率度量),特别是Mistral和Flan-T5模型表现尤为出色。在FewRel上的结果也十分有前景,在总准确率和平均准确率指标上取得了第二名的好成绩。 这项研究强调了知识转移、语言模型架构以及KG完整性中的关键因素,并推进了使用LLMs和内存回放技术进行动态实时关系提取的CRE方法的发展。

URL

https://arxiv.org/abs/2504.05214

PDF

https://arxiv.org/pdf/2504.05214.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot