Paper Reading AI Learner

Biomedical Relation Extraction via Adaptive Document-Relation Cross-Mapping and Concept Unique Identifier

2025-01-09 11:19:40
Yufei Shang, Yanrong Guo, Shijie Hao, Richang Hong

Abstract

Document-Level Biomedical Relation Extraction (Bio-RE) aims to identify relations between biomedical entities within extensive texts, serving as a crucial subfield of biomedical text mining. Existing Bio-RE methods struggle with cross-sentence inference, which is essential for capturing relations spanning multiple sentences. Moreover, previous methods often overlook the incompleteness of documents and lack the integration of external knowledge, limiting contextual richness. Besides, the scarcity of annotated data further hampers model training. Recent advancements in large language models (LLMs) have inspired us to explore all the above issues for document-level Bio-RE. Specifically, we propose a document-level Bio-RE framework via LLM Adaptive Document-Relation Cross-Mapping (ADRCM) Fine-Tuning and Concept Unique Identifier (CUI) Retrieval-Augmented Generation (RAG). First, we introduce the Iteration-of-REsummary (IoRs) prompt for solving the data scarcity issue. In this way, Bio-RE task-specific synthetic data can be generated by guiding ChatGPT to focus on entity relations and iteratively refining synthetic data. Next, we propose ADRCM fine-tuning, a novel fine-tuning recipe that establishes mappings across different documents and relations, enhancing the model's contextual understanding and cross-sentence inference capabilities. Finally, during the inference, a biomedical-specific RAG approach, named CUI RAG, is designed to leverage CUIs as indexes for entities, narrowing the retrieval scope and enriching the relevant document contexts. Experiments conducted on three Bio-RE datasets (GDA, CDR, and BioRED) demonstrate the state-of-the-art performance of our proposed method by comparing it with other related works.

Abstract (translated)

文档级生物医学关系提取(Bio-RE)旨在识别广泛文本中生物医学实体之间的关系,这是生物医学文本挖掘的一个重要子领域。现有的Bio-RE方法在跨句子推理方面存在困难,这对于捕捉跨越多句话的关系至关重要。此外,先前的方法往往忽略了文档的不完备性,并缺乏外部知识整合,从而限制了上下文的丰富度。而且,标注数据的稀缺进一步阻碍了模型训练。最近,在大型语言模型(LLMs)领域的进展激发了我们探索上述所有问题以解决文档级Bio-RE的需求。 具体来说,我们提出了一种通过LLM自适应文档关系跨映射(ADRCM)微调和概念唯一标识符(CUI)检索增强生成(RAG)的文档级Bio-RE框架。首先,我们引入了REsummary迭代(IoRs)提示来解决数据稀缺问题,在这种情况下,通过引导ChatGPT关注实体关系并迭代地精炼合成数据,可以生成特定于Bio-RE任务的合成数据。 其次,我们提出了ADRCM微调方法,这是一种新的微调配方,建立了不同文档和关系之间的映射,增强了模型的上下文理解能力和跨句子推理能力。最后,在进行推断时,设计了一种名为CUI RAG的生物医学特定RAG方法,利用CUI作为实体索引,缩小检索范围并丰富相关文档背景。 我们在三个Bio-RE数据集(GDA、CDR和BioRED)上进行了实验,并通过与其它相关工作对比验证了我们所提出的方法达到了最先进的性能。

URL

https://arxiv.org/abs/2501.05155

PDF

https://arxiv.org/pdf/2501.05155.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot