Maintaining comprehensive and up-to-date knowledge graphs (KGs) is critical for modern AI systems, but manual curation struggles to scale with the rapid growth of scientific literature. This paper presents KARMA, a novel framework employing multi-agent large language models (LLMs) to automate KG enrichment through structured analysis of unstructured text. Our approach employs nine collaborative agents, spanning entity discovery, relation extraction, schema alignment, and conflict resolution that iteratively parse documents, verify extracted knowledge, and integrate it into existing graph structures while adhering to domain-specific schema. Experiments on 1,200 PubMed articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1\% LLM-verified correctness and reducing conflict edges by 18.6\% through multi-layer assessments.
保持全面和最新的知识图谱(KG)对于现代AI系统至关重要,但手动管理在科学文献迅速增长的情况下难以扩展。本文介绍了KARMA,这是一个新型框架,采用多代理大型语言模型(LLMs),通过对非结构化文本的结构化分析来自动化知识图谱的丰富过程。我们的方法使用了九个协作代理,涵盖了实体发现、关系提取、模式对齐和冲突解决等方面,并迭代地解析文档,验证提取的知识,并将其集成到现有的图形结构中,同时遵守特定领域的模式规范。在三个不同领域内1,200篇PubMed文章上的实验表明,KARMA在知识图谱丰富方面具有显著效果,在识别多达38,230个新实体的同时,实现了83.1%的LLM验证正确性,并通过多层评估将冲突边减少了18.6%。
https://arxiv.org/abs/2502.06472
This study investigates the performance of various large language models (LLMs) on zero-shot end-to-end relation extraction (RE) in Chinese, a task that integrates entity recognition and relation extraction without requiring annotated data. While LLMs show promise for RE, most prior work focuses on English or assumes pre-annotated entities, leaving their effectiveness in Chinese RE largely unexplored. To bridge this gap, we evaluate ChatGPT, Gemini, and LLaMA based on accuracy, efficiency, and adaptability. ChatGPT demonstrates the highest overall performance, balancing precision and recall, while Gemini achieves the fastest inference speed, making it suitable for real-time applications. LLaMA underperforms in both accuracy and latency, highlighting the need for further adaptation. Our findings provide insights into the strengths and limitations of LLMs for zero-shot Chinese RE, shedding light on trade-offs between accuracy and efficiency. This study serves as a foundation for future research aimed at improving LLM adaptability to complex linguistic tasks in Chinese NLP.
这项研究调查了各种大型语言模型(LLMs)在中文零样本端到端关系提取(RE)任务中的性能,该任务整合实体识别和关系抽取而不需标注数据。虽然LLMs在RE方面展现出潜力,但大多数先前的工作集中在英语或假设存在预注释的实体上,这使得它们在中国语境下的表现研究相对不足。为了弥补这一差距,我们根据准确度、效率和适应性评估了ChatGPT、Gemini和LLaMA。ChatGPT表现出最高的整体性能,在精确度和召回率之间取得了平衡;而Gemini实现了最快的推理速度,使其适用于实时应用。相比之下,LLaMA在准确性和延迟方面表现较差,这强调了进一步调整的必要性。我们的研究结果揭示了LLMs用于中文零样本RE的优势与局限,并指出了准确性与效率之间的权衡问题。这项研究为未来旨在改进LLM适应复杂汉语自然语言处理任务的研究奠定了基础。
https://arxiv.org/abs/2502.05694
Extracting event relations that deviate from known schemas has proven challenging for previous methods based on multi-class classification, MASK prediction, or prototype matching. Recent advancements in large language models have shown impressive performance through instruction tuning. Nevertheless, in the task of event relation extraction, instruction-based methods face several challenges: there are a vast number of inference samples, and the relations between events are non-sequential. To tackle these challenges, we present an improved instruction-based event relation extraction framework named MAQInstruct. Firstly, we transform the task from extracting event relations using given event-event instructions to selecting events using given event-relation instructions, which reduces the number of samples required for inference. Then, by incorporating a bipartite matching loss, we reduce the dependency of the instruction-based method on the generation sequence. Our experimental results demonstrate that MAQInstruct significantly improves the performance of event relation extraction across multiple LLMs.
基于多分类、MASK预测或原型匹配的先前方法在提取偏离已知模式的事件关系方面遇到了挑战。近期大型语言模型通过指令微调展示了令人印象深刻的性能。然而,在事件关系抽取任务中,基于指令的方法面临几个难题:有大量推理样本,并且事件之间的关系是非顺序性的。为了应对这些挑战,我们提出了一种改进的基于指令的事件关系抽取框架——MAQInstruct。 首先,我们将任务从使用给定的事件-事件指令来提取事件关系转换为使用给定的事件-关系指令来选择事件,从而减少了推理所需的样本数量。然后,通过引入二部图匹配损失,我们降低了基于指令方法对生成顺序的依赖性。实验结果表明,MAQInstruct显著提高了多个大型语言模型在事件关系抽取任务上的性能。
https://arxiv.org/abs/2502.03954
Objective: To evaluate the accuracy, computational cost and portability of a new Natural Language Processing (NLP) method for extracting medication information from clinical narratives. Materials and Methods: We propose an original transformer-based architecture for the extraction of entities and their relations pertaining to patients' medication regimen. First, we used this approach to train and evaluate a model on French clinical notes, using a newly annotated corpus from Hôpitaux Universitaires de Strasbourg. Second, the portability of the approach was assessed by conducting an evaluation on clinical documents in English from the 2018 n2c2 shared task. Information extraction accuracy and computational cost were assessed by comparison with an available method using transformers. Results: The proposed architecture achieves on the task of relation extraction itself performance that are competitive with the state-of-the-art on both French and English (F-measures 0.82 and 0.96 vs 0.81 and 0.95), but reduce the computational cost by 10. End-to-end (Named Entity recognition and Relation Extraction) F1 performance is 0.69 and 0.82 for French and English corpus. Discussion: While an existing system developed for English notes was deployed in a French hospital setting with reasonable effort, we found that an alternative architecture offered end-to-end drug information extraction with comparable extraction performance and lower computational impact for both French and English clinical text processing, respectively. Conclusion: The proposed architecture can be used to extract medication information from clinical text with high performance and low computational cost and consequently suits with usually limited hospital IT resources
目标:评估一种新的自然语言处理(NLP)方法在从临床叙述中提取药物信息方面的准确性、计算成本和便携性。 材料与方法: 我们提出了一种基于转换器的原创架构,用于患者用药方案实体及其关系的抽取。首先,在法语临床记录上使用这种方法进行训练并进行了评估,利用了斯特拉斯堡大学医院的新注释语料库。其次,通过在英语临床文档(2018年n2c2共享任务数据)上的评估来测试该方法的便携性。信息提取精度和计算成本通过与现有基于转换器的方法进行比较来衡量。 结果: 所提出的架构在关系抽取任务上表现出色,在法语和英语中的性能都达到了业内领先水平(F值分别为0.82和0.96,对比其他方法为0.81和0.95),同时将计算成本降低了10%。端到端的命名实体识别与关系提取的整体F1得分分别为法语文档的0.69和英语文档的0.82。 讨论: 尽管现有的为英文注释设计的系统在法国医院环境中也能合理地部署,但我们发现另一种架构提供了端到端药物信息抽取功能,并且具有类似的信息提取性能,在两种语言中都减少了计算影响。这意味着新的架构不仅适用于法语文本处理,而且也适应英语临床文档。 结论: 所提出的架构可以高效、低成本地从临床文档中提取药物信息,因此非常适合通常资源有限的医院IT环境。
https://arxiv.org/abs/2502.03257
In this work, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of probing frozen representations from a complex source task on diverse simple target probing tasks (as usually done in probing), we explore the effectiveness of embeddings from multiple simple source tasks on a single target task. We select coreference resolution, a linguistically complex problem requiring contextual understanding, as focus target task, and test the usefulness of embeddings from comparably simpler tasks tasks such as paraphrase detection, named entity recognition, and relation extraction. Through systematic experiments, we evaluate the impact of individual and combined task embeddings. Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks (e.g., paraphrase detection) proving most beneficial. Additionally, representations from intermediate layers of fine-tuned models often outperform those from final layers. Combining embeddings from multiple tasks consistently improves performance, with attention-based aggregation yielding substantial gains. These insights shed light on relationships between task-specific representations and their adaptability to complex downstream tasks, encouraging further exploration of embedding-level task transfer.
在这项工作中,我们重新构思了经典的探测方法,用于评估从简单来源任务到更复杂目标任务的知识迁移。与通常在探针实验中使用的冻结复杂源任务表示并在多样化的简单目标任务上进行探针不同,我们探索了来自多个简单来源任务的嵌入在单个目标任务上的有效性。我们将核心参考解析(一个需要上下文理解的语言难题)选为焦点目标任务,并测试了诸如语义相似性检测、命名实体识别和关系抽取等相对简单任务的嵌入对其有用性的贡献。 通过系统实验,我们评估了各个任务和组合任务的嵌入对核心参考解析的影响。我们的发现表明,不同任务的嵌入在解决核心参考问题时的效用差异很大,其中语义相似性任务(如语句改写检测)特别有益。此外,从微调模型中间层获得的表示通常优于最终层的表示。结合多个任务的嵌入可以持续提升性能,特别是基于注意力机制的聚合方法能带来显著收益。 这些见解揭示了特定任务表示与其适应复杂下游任务能力之间的关系,并鼓励进一步探索在嵌入级别上实现的任务迁移。
https://arxiv.org/abs/2501.19316
Biological relation networks contain rich information for understanding the biological mechanisms behind the relationship of entities such as genes, proteins, diseases, and chemicals. The vast growth of biomedical literature poses significant challenges updating the network knowledge. The recent Biomedical Relation Extraction Dataset (BioRED) provides valuable manual annotations, facilitating the develop-ment of machine-learning and pre-trained language model approaches for automatically identifying novel document-level (inter-sentence context) relationships. Nonetheless, its annotations lack directionality (subject/object) for the entity roles, essential for studying complex biological networks. Herein we annotate the entity roles of the relationships in the BioRED corpus and subsequently propose a novel multi-task language model with soft-prompt learning to jointly identify the relationship, novel findings, and entity roles. Our results in-clude an enriched BioRED corpus with 10,864 directionality annotations. Moreover, our proposed method outperforms existing large language models such as the state-of-the-art GPT-4 and Llama-3 on two benchmarking tasks. Our source code and dataset are available at this https URL.
生物关系网络包含丰富的信息,有助于理解基因、蛋白质、疾病和化学物质等实体之间关系背后的生物学机制。随着生物医学文献的快速增长,更新网络知识面临着重大挑战。最近发布的生物医学关系提取数据集(BioRED)提供了宝贵的标注数据,促进了机器学习和预训练语言模型方法的发展,这些方法能够自动识别文档级别的(跨句子上下文)新关系。然而,其标注缺少实体角色的方向性(主体/对象),这对于研究复杂的生物学网络至关重要。因此,我们对BioRED语料库中的关系进行了实体角色的标注,并提出了一种新的多任务语言模型,采用软提示学习技术,共同识别关系、新发现和实体角色。我们的结果包括一个包含10,864个方向性标注的丰富版BioRED语料库。此外,我们提出的方法在两个基准测试任务上超过了现有的大型语言模型(如最先进的GPT-4和Llama-3)。源代码和数据集可以在提供的网址访问。
https://arxiv.org/abs/2501.14079
Document-Level Biomedical Relation Extraction (Bio-RE) aims to identify relations between biomedical entities within extensive texts, serving as a crucial subfield of biomedical text mining. Existing Bio-RE methods struggle with cross-sentence inference, which is essential for capturing relations spanning multiple sentences. Moreover, previous methods often overlook the incompleteness of documents and lack the integration of external knowledge, limiting contextual richness. Besides, the scarcity of annotated data further hampers model training. Recent advancements in large language models (LLMs) have inspired us to explore all the above issues for document-level Bio-RE. Specifically, we propose a document-level Bio-RE framework via LLM Adaptive Document-Relation Cross-Mapping (ADRCM) Fine-Tuning and Concept Unique Identifier (CUI) Retrieval-Augmented Generation (RAG). First, we introduce the Iteration-of-REsummary (IoRs) prompt for solving the data scarcity issue. In this way, Bio-RE task-specific synthetic data can be generated by guiding ChatGPT to focus on entity relations and iteratively refining synthetic data. Next, we propose ADRCM fine-tuning, a novel fine-tuning recipe that establishes mappings across different documents and relations, enhancing the model's contextual understanding and cross-sentence inference capabilities. Finally, during the inference, a biomedical-specific RAG approach, named CUI RAG, is designed to leverage CUIs as indexes for entities, narrowing the retrieval scope and enriching the relevant document contexts. Experiments conducted on three Bio-RE datasets (GDA, CDR, and BioRED) demonstrate the state-of-the-art performance of our proposed method by comparing it with other related works.
文档级生物医学关系提取(Bio-RE)旨在识别广泛文本中生物医学实体之间的关系,这是生物医学文本挖掘的一个重要子领域。现有的Bio-RE方法在跨句子推理方面存在困难,这对于捕捉跨越多句话的关系至关重要。此外,先前的方法往往忽略了文档的不完备性,并缺乏外部知识整合,从而限制了上下文的丰富度。而且,标注数据的稀缺进一步阻碍了模型训练。最近,在大型语言模型(LLMs)领域的进展激发了我们探索上述所有问题以解决文档级Bio-RE的需求。 具体来说,我们提出了一种通过LLM自适应文档关系跨映射(ADRCM)微调和概念唯一标识符(CUI)检索增强生成(RAG)的文档级Bio-RE框架。首先,我们引入了REsummary迭代(IoRs)提示来解决数据稀缺问题,在这种情况下,通过引导ChatGPT关注实体关系并迭代地精炼合成数据,可以生成特定于Bio-RE任务的合成数据。 其次,我们提出了ADRCM微调方法,这是一种新的微调配方,建立了不同文档和关系之间的映射,增强了模型的上下文理解能力和跨句子推理能力。最后,在进行推断时,设计了一种名为CUI RAG的生物医学特定RAG方法,利用CUI作为实体索引,缩小检索范围并丰富相关文档背景。 我们在三个Bio-RE数据集(GDA、CDR和BioRED)上进行了实验,并通过与其它相关工作对比验证了我们所提出的方法达到了最先进的性能。
https://arxiv.org/abs/2501.05155
Cyber Threat Intelligence (CTI) is critical for mitigating threats to organizations, governments, and institutions, yet the necessary data are often dispersed across diverse formats. AI-driven solutions for CTI Information Extraction (IE) typically depend on high-quality, annotated data, which are not always available. This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction. Leveraging advanced Natural Language Processing (NLP) techniques, particularly Transformer-based architectures, the proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships. Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning. Unlike existing state-of-the-art models that rely heavily on annotated datasets, our system enables fully dataless operation through zero-shot methods for both Entity and Relation Extraction, making it adaptable to various data availability scenarios. Additionally, our supervised Entity Extractor surpasses current state-of-the-art performance in cyber Entity Extraction, highlighting the dual strength of the framework in both low-resource and data-rich environments. By aligning the system's outputs with the Structured Threat Information Expression (STIX) format, a standard for information exchange in the cybersecurity domain, 0-CTI standardizes extracted knowledge, enhancing communication and collaboration in cybersecurity operations.
网络安全威胁情报(CTI)对于组织、政府和机构防范威胁至关重要,然而必要的数据通常分散在各种格式中。基于AI的CTI信息提取解决方案通常依赖于高质量且经过标注的数据集,而这些数据集并不总是可用。本文介绍了0-CTI框架,这是一个旨在高效进行网络安全威胁情报(CTI)信息提取的人工智能可扩展框架。 该系统利用先进的自然语言处理技术,特别是基于Transformer架构的技术,对完整的网络安全报告文本序列进行处理,以提取命名实体及其关系的网络本体。我们的贡献在于开发了0-CTI,这是一个模块化框架,首次支持网络安全威胁情报(CTI)信息抽取,并且能够同时适应监督学习和零样本学习场景。 与现有最先进的模型不同,这些模型严重依赖于标注数据集,我们的系统通过零样本方法实现了完全无数据操作,既可以用于实体提取也可以用于关系提取。这使得0-CTI能够灵活地适应各种数据可用性情况。 此外,在有监督的实体抽取方面,我们的系统在网络安全实体抽取方面的表现超过了现有的最先进的性能,强调了该框架在低资源环境和高资源环境中的双重优势。 通过将系统的输出与结构化威胁信息表达(STIX)格式对齐——这是网络安全领域中信息交换的标准,0-CTI标准化了提取的知识,从而提高了网络防御操作中的沟通和协作效率。
https://arxiv.org/abs/2501.06239
We introduce GLiREL (Generalist Lightweight model for zero-shot Relation Extraction), an efficient architecture and training paradigm for zero-shot relation classification. Inspired by recent advancements in zero-shot named entity recognition, this work presents an approach to efficiently and accurately predict zero-shot relationship labels between multiple entities in a single forward pass. Experiments using the FewRel and WikiZSL benchmarks demonstrate that our approach achieves state-of-the-art results on the zero-shot relation classification task. In addition, we contribute a protocol for synthetically-generating datasets with diverse relation labels.
我们介绍了GLiREL(通用轻量级零样本关系抽取模型),这是一种用于零样本关系分类的有效架构和训练范式。受最近在零样本命名实体识别方面进展的启发,本工作提出了一种方法,在单次前向传递中可以高效且准确地预测多个实体之间的零样本关系标签。使用FewRel和WikiZSL基准进行的实验表明,我们的方法在零样本关系分类任务上达到了当前的最佳性能。此外,我们还贡献了一个用于合成生成具有多样关系标签的数据集的协议。
https://arxiv.org/abs/2501.03172
Generative relation extraction (RE) commonly involves first reformulating RE as a linguistic modeling problem easily tackled with pre-trained language models (PLM) and then fine-tuning a PLM with supervised cross-entropy loss. Although having achieved promising performance, existing approaches assume only one deterministic relation between each pair of entities without considering real scenarios where multiple relations may be valid, i.e., entity pair overlap, causing their limited applications. To address this problem, we introduce a novel contrastive prompt tuning method for RE, CPTuning, which learns to associate a candidate relation between two in-context entities with a probability mass above or below a threshold, corresponding to whether the relation exists. Beyond learning schema, CPTuning also organizes RE as a verbalized relation generation task and uses Trie-constrained decoding to ensure a model generates valid relations. It adaptively picks out the generated candidate relations with a high estimated likelihood in inference, thereby achieving multi-relation extraction. We conduct extensive experiments on four widely used datasets to validate our method. Results show that T5-large fine-tuned with CPTuning significantly outperforms previous methods, regardless of single or multiple relations extraction.
生成式关系抽取(RE)通常包括将RE问题重新表述为语言模型可以轻松处理的问题,然后使用预训练的语言模型(PLM),并通过监督交叉熵损失进行微调。尽管现有方法已经取得了令人满意的成绩,但它们假设每对实体之间只有一个确定的关系,而没有考虑到实际场景中可能存在多个有效关系的情况,即实体对重叠,这限制了它们的应用范围。为了解决这个问题,我们引入了一种新的对比提示调整方法——CPTuning,用于RE任务,它能够学习将一对上下文中的实体之间的候选关系与阈值之上或之下的概率质量相关联,对应于该关系是否存在。除了学习模式之外,CPTuning还将RE组织成一个语言化的关系生成任务,并使用Trie约束解码来确保模型生成有效的关系。在推理过程中,它能够自适应地选择那些具有高估计可能性的生成候选关系,从而实现多关系抽取。 我们在四个广泛使用的数据集上进行了广泛的实验以验证我们的方法。结果显示,无论是单一关系还是多重关系提取,通过CPTuning微调的T5-large模型均显著优于先前的方法。
https://arxiv.org/abs/2501.02196
Benchmarks are crucial for evaluating machine learning algorithm performance, facilitating comparison and identifying superior solutions. However, biases within datasets can lead models to learn shortcut patterns, resulting in inaccurate assessments and hindering real-world applicability. This paper addresses the issue of entity bias in relation extraction tasks, where models tend to rely on entity mentions rather than context. We propose a debiased relation extraction benchmark DREB that breaks the pseudo-correlation between entity mentions and relation types through entity replacement. DREB utilizes Bias Evaluator and PPL Evaluator to ensure low bias and high naturalness, providing a reliable and accurate assessment of model generalization in entity bias scenarios. To establish a new baseline on DREB, we introduce MixDebias, a debiasing method combining data-level and model training-level techniques. MixDebias effectively improves model performance on DREB while maintaining performance on the original dataset. Extensive experiments demonstrate the effectiveness and robustness of MixDebias compared to existing methods, highlighting its potential for improving the generalization ability of relation extraction models. We will release DREB and MixDebias publicly.
基准测试对于评估机器学习算法的性能至关重要,它们有助于比较不同方法并识别出更优解决方案。然而,数据集中的偏差会导致模型学习到捷径模式(shortcut patterns),从而产生不准确的评估结果,并限制了其实用性在现实世界中的应用。本文针对关系抽取任务中实体偏差的问题提出了一种新的基准测试DREB(Debiased Relation Extraction Benchmark)。该问题在于模型倾向于依赖于实体提及而非上下文信息。 我们提出的DREB通过替换实体来打破实体提及与关系类型之间的伪相关性,以此减少偏见并确保评估的可靠性和准确性。DREB使用了两个评价器:Bias Evaluator和PPL Evaluator,以保证低偏差和高自然度(naturalness),从而为模型在实体偏差情况下的泛化能力提供可靠的评价。 为了建立一个新的基准线,我们还引入了一种名为MixDebias的去偏方法。该方法结合了数据层面和模型训练层面的技术来减轻偏差的影响。通过实验验证,MixDebias不仅在DREB上有效提高了模型性能,同时还能保持在原始数据集上的表现不变。 广泛实验证明,与现有的去偏技术相比,MixDebias展示了更强的效果和鲁棒性,这表明其具有提高关系抽取模型泛化能力的潜力。我们计划公开发布DREB以及MixDebias供科研人员使用。
https://arxiv.org/abs/2501.01349
Multi-task semantic communication can serve multiple learning tasks using a shared encoder model. Existing models have overlooked the intricate relationships between features extracted during an encoding process of tasks. This paper presents a new graph attention inter-block (GAI) module to the encoder/transmitter of a multi-task semantic communication system, which enriches the features for multiple tasks by embedding the intermediate outputs of encoding in the features, compared to the existing techniques. The key idea is that we interpret the outputs of the intermediate feature extraction blocks of the encoder as the nodes of a graph to capture the correlations of the intermediate features. Another important aspect is that we refine the node representation using a graph attention mechanism to extract the correlations and a multi-layer perceptron network to associate the node representations with different tasks. Consequently, the intermediate features are weighted and embedded into the features transmitted for executing multiple tasks at the receiver. Experiments demonstrate that the proposed model surpasses the most competitive and publicly available models by 11.4% on the CityScapes 2Task dataset and outperforms the established state-of-the-art by 3.97% on the NYU V2 3Task dataset, respectively, when the bandwidth ratio of the communication channel (i.e., compression level for transmission over the channel) is as constrained as 1 12 .
多任务语义通信可以通过共享的编码模型服务于多个学习任务。现有的模型忽略了在任务编码过程中提取特征之间的复杂关系。本文提出了一种新的图注意力互模块(Graph Attention Inter-block, GAI)添加到多任务语义通信系统的编码器/发射机中,通过将中间输出嵌入到不同任务的特征中来丰富这些任务的功能,相较于现有技术而言有所改进。 该方法的关键思想是:我们将编码器中的中间特征提取块的输出视为图中的节点,以捕捉中间特征之间的关联。另一个重要方面是我们使用图注意力机制精炼节点表示,并通过多层感知机网络将节点表示与不同任务相关联。因此,在接收端执行多个任务时,经过加权和嵌入处理后的中间特征被用于传输。 实验表明,在通信信道带宽比(即传输通道上的压缩级别)受到严格限制(如1/12的情况下),所提出的模型在CityScapes 2Task数据集上超越了最具有竞争力的公开可用模型,性能提升了11.4%,同时在NYU V2 3Task数据集上也超越了现有最先进的方法,性能提高了3.97%。
https://arxiv.org/abs/2501.02006
Document-level relation extraction (Doc-RE) aims to extract relations between entities across multiple sentences. Therefore, Doc-RE requires more comprehensive reasoning abilities like humans, involving complex cross-sentence interactions between entities, contexts, and external general knowledge, compared to the sentence-level RE. However, most existing Doc-RE methods focus on optimizing single reasoning ability, but lack the ability to utilize external knowledge for comprehensive reasoning on long documents. To solve these problems, a knowledge retrieval augmented method, named KnowRA, was proposed with comprehensive reasoning to autonomously determine whether to accept external knowledge to assist DocRE. Firstly, we constructed a document graph for semantic encoding and integrated the co-reference resolution model into KnowRA to augment the co-reference reasoning ability. Then, we further expanded the document graph into a document knowledge graph by retrieving the external knowledge base and introduced the axis attention mechanism into KnowRA to improve its common-sense and logical reasoning abilities, respectively. Finally, a knowledge filtering method was presented in the common-sense and co-reference reasoning module to filter out irrelevant knowledge. Extensive experiments conducted on two datasets verified the effectiveness of our method compared to the state-of-the-art baselines. Our code is available at this https URL.
文档级关系抽取(Doc-RE)的目标是从多个句子中提取实体之间的关系。因此,与句子级别的关系抽取相比,Doc-RE 要求具备更全面的推理能力,包括复杂的跨句实体、上下文和外部常识知识间的交互作用。然而,大多数现有的 Doc-RE 方法主要集中在优化单一的推理能力上,并且缺乏利用长文档中的外部知识进行全面推理的能力。 为了解决这些问题,提出了一种增强的知识检索方法——KnowRA(Knowledge Retrieval Augmented),它通过综合推理自主决定是否接受外部知识来辅助DocRE。首先,我们构建了用于语义编码的文档图,并将共指消解模型整合到 KnowRA 中以加强共指推理能力。然后,我们将文档图扩展为一个文档知识图,通过检索外部知识库进一步丰富其内容,并在 KnowRA 中引入轴向注意力机制来分别提高常识和逻辑推理的能力。最后,在常识和共指推理模块中提出了一种知识过滤方法来剔除无关的知识。 我们在两个数据集上进行了广泛的实验,验证了我们的方法相对于现有最先进的基线的有效性。我们的代码可以在提供的链接处获取。
https://arxiv.org/abs/2501.00571
Integrating Large Language Models (LLMs) in healthcare diagnosis demands systematic frameworks that can handle complex medical scenarios while maintaining specialized expertise. We present KG4Diagnosis, a novel hierarchical multi-agent framework that combines LLMs with automated knowledge graph construction, encompassing 362 common diseases across medical specialties. Our framework mirrors real-world medical systems through a two-tier architecture: a general practitioner (GP) agent for initial assessment and triage, coordinating with specialized agents for in-depth diagnosis in specific domains. The core innovation lies in our end-to-end knowledge graph generation methodology, incorporating: (1) semantic-driven entity and relation extraction optimized for medical terminology, (2) multi-dimensional decision relationship reconstruction from unstructured medical texts, and (3) human-guided reasoning for knowledge expansion. KG4Diagnosis serves as an extensible foundation for specialized medical diagnosis systems, with capabilities to incorporate new diseases and medical knowledge. The framework's modular design enables seamless integration of domain-specific enhancements, making it valuable for developing targeted medical diagnosis systems. We provide architectural guidelines and protocols to facilitate adoption across medical contexts.
将大型语言模型(LLMs)整合到医疗诊断中需要能够处理复杂医疗场景并保持专业专长的系统框架。我们提出了KG4Diagnosis,这是一种新颖的分层多智能体框架,它结合了LLMs与自动知识图构建技术,涵盖了跨越医学专科的362种常见疾病。我们的框架通过两级架构模拟现实世界的医疗系统:初级保健医生(GP)代理用于初步评估和分类,协调专门领域的深入诊断专家代理。核心创新在于我们端到端的知识图生成方法论,包括:(1) 优化医疗术语的语义驱动实体与关系抽取;(2) 从非结构化的医学文本中重构多维决策关系;以及 (3) 在人类引导下进行推理以扩展知识。KG4Diagnosis为专门的医疗诊断系统提供了一个可扩展的基础,具备整合新疾病和医学知识的能力。该框架的模块化设计允许无缝集成领域特定的功能增强,使其在开发针对性强的医疗诊断系统中具有重要价值。我们提供了架构指南和协议以促进其在各种医疗场景中的应用。
https://arxiv.org/abs/2412.16833
This paper proposes a novel approach to develop an open-domain and long-form Over-The-Top (OTT) Question-Answering (QA) dataset, DragonVerseQA, specifically oriented to the fantasy universe of "House of the Dragon" and "Game Of Thrones" TV series. Most existing QA datasets focus on short, fact-based answers sourced almost solely from Wikipedia articles, devoid of depth and contextual richness for sophisticated narrative understanding. We curate a dataset that combines full episode summaries sourced from HBO and fandom wiki websites, user reviews from sources like IMDb and Rotten Tomatoes, and high-quality, open-domain, legally admissible sources, and structured data from repositories like WikiData into one dataset. The dataset provides a multi-dimensional context, reflecting complex character dynamics and plot developments from these varied sources. That means, on equal footing, only after heavy data preprocessing and filtering methods will meaningful, non-spam unbiased reviews be available in this enriched dataset. The comprehensive insights are given through the long-form answers generated from this enriched context. This is what makes this valuable dataset for improving conversational AI, narrative analysis, sentiment analysis, summarization techniques, and relation extraction. A comparative analysis with state-of-the-art QA datasets such as SQuAD 2.0, TriviaQA, and Natural Questions brings to light the unique advantages of our dataset in terms of contextual complexity and answer length. Detailed reviews add layers to audience sentiment and narrative interpretation, raising the bar for domain-specific QA with a new quality benchmark. Our work also allows a deeper understanding of entertainment-industry content and opens the door to more knowledgeable and creative AI-driven interactions within digital media environments.
本文提出了一种新颖的方法,用于开发面向“龙之屋”和“权力的游戏”电视剧幻想宇宙的开放领域长形式OTT问答数据集DragonVerseQA。大多数现有的问答数据集侧重于简短的事实性答案,这些答案几乎全部来源于维基百科文章,缺乏深度和语境丰富性以支持复杂的叙事理解。我们整理了一个数据集,将从HBO和粉丝wiki网站获得的完整剧集摘要、来自IMDb和烂番茄等来源的用户评论以及高质量、开放领域且合法可使用的资源,以及来自WikiData等存储库的结构化数据整合到一个数据集中。该数据集提供了多维语境,反映出这些不同来源的复杂人物动态和情节发展。这意味着,在经过大量的数据预处理和筛选方法之后,只有有意义且无偏见的非垃圾评论才能出现在这个丰富化的数据集中。通过这种丰富的背景生成长形式答案,可以提供全面的见解。这使得该有价值的数据库对提升对话式AI、叙事分析、情感分析、总结技术和关系抽取方面具有重要作用。与SQuAD 2.0、TriviaQA和Natural Questions等最先进的问答数据集进行比较表明,在语境复杂性和答案长度方面,我们的数据集拥有独特的优势。详尽的评论为观众的情感和叙事解读增添了层次,设定了领域特定问答的新质量基准。我们的工作还加深了对娱乐业内容的理解,并开启了在数字媒体环境中更加知识化、创造性的AI驱动互动的大门。
https://arxiv.org/abs/2412.16694
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE's latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE.
文档级关系抽取(DocRE)旨在识别文档内实体对之间的关系。然而,大多数现有的方法假设标签分布是均匀的,在处理现实世界中不平衡的数据集时会导致性能不佳。为了解决这一挑战,我们提出了一种利用生成模型从嵌入空间增强数据的新颖数据扩增方法。我们的方法采用变分自编码器(VAE)架构来捕捉由实体对表示形成的所有关系分布,并为代表性不足的关系增加数据。为了更好地捕捉DocRE的多标签特性,我们将扩散模型用于参数化VAE的潜在空间。此外,我们引入了一个层次训练框架,将提出的基于VAE的数据扩增模块整合到DocRE系统中。在两个基准数据集上的实验表明,我们的方法优于现有最先进模型,有效解决了DocRE中的长尾分布问题。
https://arxiv.org/abs/2412.13503
To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The framework iteratively refines the selection, greatly improving efficiency, while being model-, dataset-, and domain-independent. Through experiments on 12 biomedical datasets across four tasks - named entity recognition, relation extraction, event extraction, and text classification-we demonstrate that our approach effectively identifies better combinations, even for tasks that may seem unpromising from a human perspective. This verifies that our framework provides a promising solution for maximizing MTL potential.
为了高效地选择最佳数据集组合以提升大型语言模型的多任务学习(MTL)性能,我们提出了一种新的框架,该框架利用神经网络预测最优的数据集组合。该框架通过迭代优化选择过程,大大提高了效率,并且与模型、数据集和领域无关。通过对四个任务——命名实体识别、关系抽取、事件抽取和文本分类——上的12个生物医学数据集进行实验,我们证明了我们的方法能够有效识别更好的组合,即使对于从人类角度看可能不太有希望的任务也是如此。这验证了我们的框架为最大化MTL潜力提供了一个有前景的解决方案。
https://arxiv.org/abs/2412.11455
Information extraction from the scientific literature is one of the main techniques to transform unstructured knowledge hidden in the text into structured data which can then be used for decision-making in down-stream tasks. One such area is Trust in AI, where factors contributing to human trust in artificial intelligence applications are studied. The relationships of these factors with human trust in such applications are complex. We hence explore this space from the lens of information extraction where, with the input of domain experts, we carefully design annotation guidelines, create the first annotated English dataset in this domain, investigate an LLM-guided annotation, and benchmark it with state-of-the-art methods using large language models in named entity and relation extraction. Our results indicate that this problem requires supervised learning which may not be currently feasible with prompt-based LLMs.
从科学文献中提取信息是将隐藏在文本中的非结构化知识转化为可用于下游任务决策的结构化数据的主要技术之一。其中一个领域是关于人工智能的信任,该领域研究了影响人类对人工智能应用信任的因素。这些因素与人类对该类应用信任之间的关系非常复杂。因此,我们通过信息抽取的角度来探索这一领域,在领域专家的指导下仔细设计标注指南,创建了该领域的首个英文标注数据集,调查了基于大语言模型(LLM)指导下的标注方法,并使用大型语言模型在命名实体和关系抽取方面与最先进的方法进行了基准测试。我们的结果显示,这个问题可能需要监督学习,而目前基于提示的LLM可能还不足以支持这一需求。
https://arxiv.org/abs/2412.11344
Section identification is an important task for library science, especially knowledge management. Identifying the sections of a paper would help filter noise in entity and relation extraction. In this research, we studied the paper section identification problem in the context of Chinese medical literature analysis, where the subjects, methods, and results are more valuable from a physician's perspective. Based on previous studies on English literature section identification, we experiment with the effective features to use with classic machine learning algorithms to tackle the problem. It is found that Conditional Random Fields, which consider sentence interdependency, is more effective in combining different feature sets, such as bag-of-words, part-of-speech, and headings, for Chinese literature section identification. Moreover, we find that classic machine learning algorithms are more effective than generic deep learning models for this problem. Based on these observations, we design a novel deep learning model, the Structural Bidirectional Long Short-Term Memory (SLSTM) model, which models word and sentence interdependency together with the contextual information. Experiments on a human-curated asthma literature dataset show that our approach outperforms the traditional machine learning methods and other deep learning methods and achieves close to 90% precision and recall in the task. The model shows good potential for use in other text mining tasks. The research has significant methodological and practical implications.
部分识别是图书馆科学,特别是知识管理中的一个重要任务。识别论文的部分有助于在实体和关系抽取中过滤噪音。本研究探讨了中文医学文献分析背景下的论文部分识别问题,在医生的角度来看,主题、方法和结果更具价值。基于先前对英文文献部分识别的研究,我们实验了有效特征与经典机器学习算法的结合以解决这一问题。发现考虑句子间依赖性的条件随机场在结合不同的特征集(如词袋模型、词性标注和标题)方面更有效地进行中文文献的部分识别。此外,我们还发现在这个问题上,传统机器学习算法比通用的深度学习模型更为有效。 基于这些观察,我们设计了一种新型深度学习模型——结构化双向长短时记忆网络(SLSTM),该模型将词语与句子间的依赖性以及上下文信息结合起来进行建模。在人工整理的哮喘文献数据集上的实验表明,我们的方法优于传统的机器学习方法和其他深度学习方法,在任务中实现了接近90%的精度和召回率。该模型显示出良好的潜力,可用于其他文本挖掘任务。这项研究具有重要的方法论和实践意义。
https://arxiv.org/abs/2412.11125
The rise of chronic diseases and pandemics like COVID-19 has emphasized the need for effective patient data processing while ensuring privacy through anonymization and de-identification of protected health information (PHI). Anonymized data facilitates research without compromising patient confidentiality. This paper introduces expert small AI models developed using the LLM-in-the-loop methodology to meet the demand for domain-specific de-identification NER models. These models overcome the privacy risks associated with large language models (LLMs) used via APIs by eliminating the need to transmit or store sensitive data. More importantly, they consistently outperform LLMs in de-identification tasks, offering superior performance and reliability. Our de-identification NER models, developed in eight languages (English, German, Italian, French, Romanian, Turkish, Spanish, and Arabic) achieved f1-micro score averages of 0.966, 0.975, 0.976, 0.970, 0.964, 0.974, 0.978, and 0.953 respectively. These results establish them as the most accurate healthcare anonymization solutions, surpassing existing small models and even general-purpose LLMs such as GPT-4o. While Part-1 of this series introduced the LLM-in-the-loop methodology for bio-medical document translation, this second paper showcases its success in developing cost-effective expert small NER models in de-identification tasks. Our findings lay the groundwork for future healthcare AI innovations, including biomedical entity and relation extraction, demonstrating the value of specialized models for domain-specific challenges.
慢性疾病和像COVID-19这样的流行病的兴起强调了在确保通过匿名化和去识别受保护健康信息(PHI)来保障隐私的同时,有效处理患者数据的需求。匿名化的数据能够促进研究而不会泄露患者的保密性。本文介绍了使用LLM-in-the-loop方法开发的小型AI模型以满足特定领域去识别命名实体识别(NER)模型的需求。这些模型通过消除传输或存储敏感数据的需要,克服了通过API使用大型语言模型(LLMs)时带来的隐私风险。更重要的是,在去识别任务中,它们的表现始终优于LLMs,提供更优越的性能和可靠性。我们的去识别NER模型已经开发出八种语言版本(英语、德语、意大利语、法语、罗马尼亚语、土耳其语、西班牙语和阿拉伯语),其f1微平均得分分别为0.966、0.975、0.976、0.970、0.964、0.974、0.978和0.953。这些结果确立了它们作为最准确的医疗匿名化解决方案的地位,超越了现有的小型模型甚至像GPT-4这样的通用LLMs。虽然本系列的第一部分介绍了用于生物医学文档翻译的LLM-in-the-loop方法,但本文第二部分展示了该方法在开发成本效益高的专家小型NER模型方面的成功,特别是在去识别任务中。我们的发现为未来的医疗AI创新奠定了基础,包括生物医学实体和关系提取,证明了特定领域挑战下专业模型的价值。
https://arxiv.org/abs/2412.10918