Ontology

Towards Efficient Patient Recruitment for Clinical Trials: Application of a Prompt-Based Learning Model

2024-04-24 20:42:28

Mojdeh Rahmanian, Seyed Mostafa Fakhrahmad, Seyedeh Zahra Mousavi

arXiv_CL

arXiv_CL Face Summarization Ontology Language_Model Transformer Pose Medical Chat
Abstract

Objective: Clinical trials are essential for advancing pharmaceutical interventions, but they face a bottleneck in selecting eligible participants. Although leveraging electronic health records (EHR) for recruitment has gained popularity, the complex nature of unstructured medical texts presents challenges in efficiently identifying participants. Natural Language Processing (NLP) techniques have emerged as a solution with a recent focus on transformer models. In this study, we aimed to evaluate the performance of a prompt-based large language model for the cohort selection task from unstructured medical notes collected in the EHR. Methods: To process the medical records, we selected the most related sentences of the records to the eligibility criteria needed for the trial. The SNOMED CT concepts related to each eligibility criterion were collected. Medical records were also annotated with MedCAT based on the SNOMED CT ontology. Annotated sentences including concepts matched with the criteria-relevant terms were extracted. A prompt-based large language model (Generative Pre-trained Transformer (GPT) in this study) was then used with the extracted sentences as the training set. To assess its effectiveness, we evaluated the model's performance using the dataset from the 2018 n2c2 challenge, which aimed to classify medical records of 311 patients based on 13 eligibility criteria through NLP techniques. Results: Our proposed model showed the overall micro and macro F measures of 0.9061 and 0.8060 which were among the highest scores achieved by the experiments performed with this dataset. Conclusion: The application of a prompt-based large language model in this study to classify patients based on eligibility criteria received promising scores. Besides, we proposed a method of extractive summarization with the aid of SNOMED CT ontology that can be also applied to other medical texts.

Abstract (translated)

目标：临床试验对于推动制药干预至关重要，但在选择合适参与者方面存在瓶颈。尽管利用电子病历（EHR）进行招募的做法已经受到欢迎，但非结构化医疗文本复杂的 nature 提出了有效地识别参与者的挑战。自然语言处理（NLP）技术在最近关注于Transformer模型方面成为了解决方案。在这项研究中，我们旨在评估基于提示的大型语言模型在从EHR中收集的非结构化医疗文本的队列选择任务中的性能。方法：为了处理医学记录，我们选择了与需要试验资格标准相关的最相关的句子。收集了与每个资格标准相关的SNOMED CT概念。同时，根据SNOMED CT语义数据库对医学记录进行了注释。包括与标准匹配的概念的注解句子被提取出来。然后，使用基于提示的大型语言模型（本研究中使用的是Generative Pre-trained Transformer（GPT））对提取的句子进行训练。为了评估其效果，我们使用2018 n2c2挑战的数据集来评估模型的性能，该数据集旨在根据13个资格标准对311名患者的医疗记录进行分类。结果：与该数据集上进行的实验相比，我们提出的模型在整体微和宏观F分数方面得分最高，为0.9061和0.8060，这是该数据集中实现的最高分数。结论：将提示式大型语言模型应用于根据资格标准对患者进行分类，在本研究中得到了有前景的分数。此外，我们还提出了使用SNOMED CT语义数据库的提取式总结方法，该方法也可以应用于其他医学文本。

URL

https://arxiv.org/abs/2404.16198

PDF

https://arxiv.org/pdf/2404.16198.pdf
Read All
MoDE: CLIP Data Experts via Clustering

2024-04-24 17:59:24

Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

arXiv_CV

arXiv_CV Caption Classification Image_Classification Relation Inference Ontology Pose Zero-Shot
Abstract

The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less ($<$35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts. The code is available at this https URL.

Abstract (translated)

对比性语言-图像预训练（CLIP）的成功取决于图像与摘要之间的配对监督，而这类数据往往存在噪声。我们提出了混合数据专家（MoDE）方法并通过聚类学习系统。每个数据专家在一个数据聚类上进行训练，对其他聚类的虚假负噪声更不敏感。在推理时，我们通过任务元数据与聚类条件的关联来应用权重。为了精确估计相关性，一个聚类的样本应该在语义上相似，但数据专家的数量仍应保持在训练和推理的合理范围内。因此，我们在人类语言的语义层次上考虑元数据，并建议在粗粒度层面使用细粒度聚类中心来表示每个数据专家。实验研究表明，在ViT-B/16上，四个CLIP数据专家超过了ViT-L/14上的OpenAI CLIP和OpenCLIP在零散图像分类上的表现，但训练成本较低（<35%）。与此同时，MoDE可以异步训练所有数据专家，并可以灵活地包括新的数据专家。代码可在此处下载：https://thisurl.com

URL

https://arxiv.org/abs/2404.16030

PDF

https://arxiv.org/pdf/2404.16030.pdf
Read All
EnzChemRED, a rich enzyme chemistry relation extraction dataset

2024-04-22 14:18:34

Po-Ting Lai, Elisabeth Coudert, Lucila Aimo, Kristian Axelsen, Lionel Breuza, Edouard de Castro, Marc Feuermann, Anne Morgat, Lucille Pourcel, Ivo Pedruzzi, Sylvain Poux, Nicole Redaschi, Catherine Rivoire, Anastasia Sveshnikova, Chih-Hsuan Wei, Robert Leaman, Ling Luo, Zhiyong Lu, Alan Bridge

arXiv_CL

arXiv_CL Recognition Relation Relation_Extraction Ontology Knowledge Language_Model Action
Abstract

Expert curation is essential to capture knowledge of enzyme functions from the scientific literature in FAIR open knowledgebases but cannot keep pace with the rate of new discoveries and new publications. In this work we present EnzChemRED, for Enzyme Chemistry Relation Extraction Dataset, a new training and benchmarking dataset to support the development of Natural Language Processing (NLP) methods such as (large) language models that can assist enzyme curation. EnzChemRED consists of 1,210 expert curated PubMed abstracts in which enzymes and the chemical reactions they catalyze are annotated using identifiers from the UniProt Knowledgebase (UniProtKB) and the ontology of Chemical Entities of Biological Interest (ChEBI). We show that fine-tuning pre-trained language models with EnzChemRED can significantly boost their ability to identify mentions of proteins and chemicals in text (Named Entity Recognition, or NER) and to extract the chemical conversions in which they participate (Relation Extraction, or RE), with average F1 score of 86.30% for NER, 86.66% for RE for chemical conversion pairs, and 83.79% for RE for chemical conversion pairs and linked enzymes. We combine the best performing methods after fine-tuning using EnzChemRED to create an end-to-end pipeline for knowledge extraction from text and apply this to abstracts at PubMed scale to create a draft map of enzyme functions in literature to guide curation efforts in UniProtKB and the reaction knowledgebase Rhea. The EnzChemRED corpus is freely available at this https URL.

Abstract (translated)

专家策展对于从FAIR开放知识库中捕获酶功能知识至关重要，但无法跟上新发现和新出版物的发展速度。在这项工作中，我们提出了EnzChemRED，Enzyme Chemistry Relation Extraction Dataset的训练和基准数据集，以支持开发自然语言处理（NLP）方法，如（大型）语言模型，以协助酶策展。EnzChemRED由1,210个专家编写的PubMed摘要组成，其中酶及其催化的化学反应使用来自UniProt知识库（UniProtKB）和化学生物实体（ChEBI）的标识符进行注释。我们证明了使用EnzChemRED对预训练语言模型进行微调可以显著提高其在文本（命名实体识别，NER）中识别蛋白质和化学物质的提及能力以及提取它们参与的化学转换（关系提取，RE）能力，平均F1分数为86.30% for NER，86.66% for RE for chemical conversion pairs，83.79% for RE for chemical conversion pairs and linked enzymes。我们使用EnzChemRED中表现最好的方法对文本进行微调，创建了从文本到摘要的端到端管道，并将此应用于PubMed大小的摘要以创建酶功能文献的初步映射，以指导在UniProtKB和反应知识库Rhea中的策展工作。EnzChemRED语料库可在此链接处免费获取：https://www.ncbi.nlm.nih.gov/25962541

URL

https://arxiv.org/abs/2404.14209

PDF

https://arxiv.org/pdf/2404.14209.pdf
Read All
GraphMatcher: A Graph Representation Learning Approach for Ontology Matching

2024-04-20 18:30:17

Sefika Efeoglu

arXiv_AI

arXiv_AI Attention Represenation_Learning Relation Ontology Matching
Abstract

Ontology matching is defined as finding a relationship or correspondence between two or more entities in two or more ontologies. To solve the interoperability problem of the domain ontologies, semantically similar entities in these ontologies must be found and aligned before merging them. GraphMatcher, developed in this study, is an ontology matching system using a graph attention approach to compute higher-level representation of a class together with its surrounding terms. The GraphMatcher has obtained remarkable results in in the Ontology Alignment Evaluation Initiative (OAEI) 2022 conference track. Its codes are available at ~\url{this https URL}.

Abstract (translated)

语义匹配是一种在两个或多个语义网之间查找关系或对应关系的任务。为了解决领域语义网之间的互操作性问题，本研究开发了一种基于图注意力的语义匹配系统，用于计算类及其周围术语的高级表示。GraphMatcher在2022年Ontology Alignment Evaluation Initiative（OAEI）会议跟踪中取得了显著的成果。其代码可在此处下载：https://this https URL。

URL

https://arxiv.org/abs/2404.14450

PDF

https://arxiv.org/pdf/2404.14450.pdf
Read All
CT-ADE: An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results

2024-04-19 12:04:32

Anthony Yazdani, Alban Bornet, Boya Zhang, Philipp Khlebnikov, Poorya Amini, Douglas Teodoro

arXiv_CL

arXiv_CL GAN Classification Ontology Prediction Medical
Abstract

Adverse drug events (ADEs) significantly impact clinical research and public health, contributing to failures in clinical trials and leading to increased healthcare costs. The accurate prediction and management of ADEs are crucial for improving the development of safer, more effective medications, and enhancing patient outcomes. To support this effort, we introduce CT-ADE, a novel dataset compiled to enhance the predictive modeling of ADEs. Encompassing over 12,000 instances extracted from clinical trial results, the CT-ADE dataset integrates drug, patient population, and contextual information for multilabel ADE classification tasks in monopharmacy treatments, providing a comprehensive resource for developing advanced predictive models. To mirror the complex nature of ADEs, annotations are standardized at the system organ class level of the Medical Dictionary for Regulatory Activities (MedDRA) ontology. Preliminary analyses using baseline models have demonstrated promising results, achieving 73.33% F1 score and 81.54% balanced accuracy, highlighting CT-ADE's potential to advance ADE prediction. CT-ADE provides an essential tool for researchers aiming to leverage the power of artificial intelligence and machine learning to enhance patient safety and minimize the impact of ADEs on pharmaceutical research and development. Researchers interested in using the CT-ADE dataset can find all necessary resources at this https URL.

Abstract (translated)

药物不良反应（ADEs）对临床研究和公共卫生产生重大影响，导致临床试验失败和医疗费用增加。准确预测和管理ADEs对提高更安全、更有效的药物开发至关重要。为了支持这一努力，我们引入了CT-ADE，一个专门为增强ADEs预测建模的新数据集。包含从临床试验结果中提取的超过12,000个实例，CT-ADE数据集整合了药物、患者人口和上下文信息，为多标签ADE分类任务提供了一个全面的资源，以开发高级预测模型。为了反映ADEs的复杂性，在MedDRA语义层的系统器官级别进行注释。使用基线模型进行初步分析已经取得了良好的成果，实现了73.33%的F1得分和81.54%的平衡准确率，突出了CT-ADE在提高ADE预测方面的潜力。CT-ADE为研究人员利用人工智能和机器学习加强患者安全并减轻ADEs对制药研究和开发产生影响提供了一个重要的工具。对使用CT-ADE数据集感兴趣的研究人员可以在该链接找到所有必要的资源。

URL

https://arxiv.org/abs/2404.12827

PDF

https://arxiv.org/pdf/2404.12827.pdf
Read All
Neural Semantic Parsing with Extremely Rich Symbolic Meaning Representations

2024-04-19 08:06:01

Xiao Zhang, Gosse Bouma, Johan Bos

arXiv_CL

arXiv_CL Ontology Knowledge
Abstract

Current open-domain neural semantics parsers show impressive performance. However, closer inspection of the symbolic meaning representations they produce reveals significant weaknesses: sometimes they tend to merely copy character sequences from the source text to form symbolic concepts, defaulting to the most frequent word sense based in the training distribution. By leveraging the hierarchical structure of a lexical ontology, we introduce a novel compositional symbolic representation for concepts based on their position in the taxonomical hierarchy. This representation provides richer semantic information and enhances interpretability. We introduce a neural "taxonomical" semantic parser to utilize this new representation system of predicates, and compare it with a standard neural semantic parser trained on the traditional meaning representation format, employing a novel challenge set and evaluation metric for evaluation. Our experimental findings demonstrate that the taxonomical model, trained on much richer and complex meaning representations, is slightly subordinate in performance to the traditional model using the standard metrics for evaluation, but outperforms it when dealing with out-of-vocabulary concepts. This finding is encouraging for research in computational semantics that aims to combine data-driven distributional meanings with knowledge-based symbolic representations.

Abstract (translated)

目前公开领域的神经语义解析器表现出令人印象深刻的性能。然而，对其产生的符号意义表示的近距离观察揭示了显著的弱点：有时候它们倾向于仅仅从源文本中复制字符序列以形成符号概念，默认为基于训练分布中最常见单词意义的最频词汇。通过利用词汇本体的层次结构，我们引入了一种基于它们在分类层次结构中的位置的新组合符号表示概念。这种表示提供了更丰富的语义信息并提高了可解释性。我们引入了一个神经“语义分类”语义解析器，用于利用这种基于命题的新表示系统，并将其与使用传统意义表示格式训练的标准神经语义解析器进行比较。我们的实验结果表明，基于更丰富和复杂语义表示的语义模型在标准评估指标上的性能略微低于使用标准评估指标的传统模型，但在处理非词汇概念时表现优异。这一发现对于旨在将数据驱动的分布语义与知识驱动的符号表示相结合的计算语义研究来说是有益的。

URL

https://arxiv.org/abs/2404.12698

PDF

https://arxiv.org/pdf/2404.12698.pdf
Read All
AmbigDocs: Reasoning across Documents on Different Entities under the Same Name

2024-04-18 18:12:01

Yoonsang Lee, Xi Ye, Eunsol Choi

arXiv_CL

arXiv_CL Ontology Language_Model
Abstract

Different entities with the same name can be difficult to distinguish. Handling confusing entity mentions is a crucial skill for language models (LMs). For example, given the question "Where was Michael Jordan educated?" and a set of documents discussing different people named Michael Jordan, can LMs distinguish entity mentions to generate a cohesive answer to the question? To test this ability, we introduce a new benchmark, AmbigDocs. By leveraging Wikipedia's disambiguation pages, we identify a set of documents, belonging to different entities who share an ambiguous name. From these documents, we generate questions containing an ambiguous name and their corresponding sets of answers. Our analysis reveals that current state-of-the-art models often yield ambiguous answers or incorrectly merge information belonging to different entities. We establish an ontology categorizing four types of incomplete answers and automatic evaluation metrics to identify such categories. We lay the foundation for future work on reasoning across multiple documents with ambiguous entities.

Abstract (translated)

具有相同名称的不同实体可能很难区分。处理令人困惑的实体提及是语言模型（LMs）的一项关键技能。例如，给定问题“迈克尔·乔丹在哪里受教育？”以及一系列讨论不同名为迈克尔·乔丹的人的文件，LMs能否区分实体提及并生成针对问题的连贯答案？为了测试这种能力，我们引入了一个新的基准，AmbigDocs。通过利用维基百科的歧义页面，我们找到了一组属于不同实体的具有模糊名称的文档。从这些文档中，我们生成包含模糊名称和相关答案的问题。我们的分析显示，当前最先进的模型通常会产生模糊的答案或错误地合并来自不同实体的信息。我们建立了一个分类为四种不完整答案的元数据模型和自动评估指标，以识别这些类别。我们在跨多个具有模糊实体的文档之间进行推理的基础之上，为未来的研究工作奠定了基础。

URL

https://arxiv.org/abs/2404.12447

PDF

https://arxiv.org/pdf/2404.12447.pdf
Read All
Incremental Bootstrapping and Classification of Structured Scenes in a Fuzzy Ontology

2024-04-17 20:51:13

Luca Buoncompagni, Fulvio Mastrogiovanni

arXiv_AI

arXiv_AI Classification Ontology Knowledge
Abstract

We foresee robots that bootstrap knowledge representations and use them for classifying relevant situations and making decisions based on future observations. Particularly for assistive robots, the bootstrapping mechanism might be supervised by humans who should not repeat a training phase several times and should be able to refine the taught representation. We consider robots that bootstrap structured representations to classify some intelligible categories. Such a structure should be incrementally bootstrapped, i.e., without invalidating the identified category models when a new additional category is considered. To tackle this scenario, we presented the Scene Identification and Tagging (SIT) algorithm, which bootstraps structured knowledge representation in a crisp OWL-DL ontology. Over time, SIT bootstraps a graph representing scenes, sub-scenes and similar scenes. Then, SIT can classify new scenes within the bootstrapped graph through logic-based reasoning. However, SIT has issues with sensory data because its crisp implementation is not robust to perception noises. This paper presents a reformulation of SIT within the fuzzy domain, which exploits a fuzzy DL ontology to overcome the robustness issues. By comparing the performances of fuzzy and crisp implementations of SIT, we show that fuzzy SIT is robust, preserves the properties of its crisp formulation, and enhances the bootstrapped representations. On the contrary, the fuzzy implementation of SIT leads to less intelligible knowledge representations than the one bootstrapped in the crisp domain.

Abstract (translated)

我们预计将出现能够引导知识表示的机器人，并将其用于分类相关情况并根据未来观察结果做出决策的机器人。特别是辅助机器人，引导机制可能由人类监督，他们不应该重复训练阶段多次，并且应该能够精炼所教授的表示。我们认为，引导结构化表示以分类一些可解释类别的机器人。这种结构应该通过逐步引导来进行，即在考虑新增类别时不会破坏已确定的类别模型。为解决这种情况，我们提出了Scene Identification and Tagging (SIT)算法，它在 crisp OWL-DL 上下文中引导结构化知识表示。随着时间的推移，SIT 通过基于逻辑推理绘制场景、子场景和类似场景的图。然后，SIT 通过逻辑推理对引导的图中的新场景进行分类。然而，SIT 在感官数据方面存在问题，因为其明确的实现对感知噪声不具有鲁棒性。本文在模糊领域对SIT进行了重新表述，利用模糊DL 上下文克服了鲁棒性问题。通过比较模糊和明确实现SIT的性能，我们证明了模糊SIT具有鲁棒性，保留了其明确的公式的性质，并增强了引导的表示。相反，模糊实现SIT导致生成的知识表示比在清晰领域引导的更不清晰。

URL

https://arxiv.org/abs/2404.11744

PDF

https://arxiv.org/pdf/2404.11744.pdf
Read All
Towards Complex Ontology Alignment using Large Language Models

2024-04-16 07:13:22

Reihaneh Amini, Sanaz Saki Norouzi, Pascal Hitzler, Reza Amini

arXiv_AI

arXiv_AI Relation Ontology Language_Model
Abstract

Ontology alignment, a critical process in the Semantic Web for detecting relationships between different ontologies, has traditionally focused on identifying so-called "simple" 1-to-1 relationships through class labels and properties comparison. The more practically useful exploration of more complex alignments remains a hard problem to automate, and as such is largely underexplored, i.e. in application practice it is usually done manually by ontology and domain experts. Recently, the surge in Natural Language Processing (NLP) capabilities, driven by advancements in Large Language Models (LLMs), presents new opportunities for enhancing ontology engineering practices, including ontology alignment tasks. This paper investigates the application of LLM technologies to tackle the complex ontology alignment challenge. Leveraging a prompt-based approach and integrating rich ontology content so-called modules our work constitutes a significant advance towards automating the complex alignment task.

Abstract (translated)

知识图谱对齐，作为一个在语义网中检测不同知识图谱之间关系的关键过程，通常集中在通过类标签和属性比较识别所谓的“简单”1对1关系。更实际可行的对更复杂对齐的探索仍然是一个难以自动化的困难问题，因此它仍然被大大忽视。即在应用实践中，通常是由本体和领域专家手动完成的。最近，自然语言处理（NLP）能力的突飞猛进，受到大型语言模型（LLMs）的进步，为增强语义工程实践提供了新的机会，包括语义对齐任务。本文调查了LLM技术在解决复杂语义对齐挑战中的应用。我们利用提示式方法并整合了丰富的语义内容，所谓的模块，这使得我们的工作在自动解决复杂对齐任务方面取得了显著的进展。

URL

https://arxiv.org/abs/2404.10329

PDF

https://arxiv.org/pdf/2404.10329.pdf
Read All
LLMs4OM: Matching Ontologies with Large Language Models

2024-04-16 06:55:45

Hamed Babaei Giglou, Jennifer D'Souza, S\"oren Auer

arXiv_AI

arXiv_AI Ontology Knowledge Language_Model Zero-Shot Matching
Abstract

Ontology Matching (OM), is a critical task in knowledge integration, where aligning heterogeneous ontologies facilitates data interoperability and knowledge sharing. Traditional OM systems often rely on expert knowledge or predictive models, with limited exploration of the potential of Large Language Models (LLMs). We present the LLMs4OM framework, a novel approach to evaluate the effectiveness of LLMs in OM tasks. This framework utilizes two modules for retrieval and matching, respectively, enhanced by zero-shot prompting across three ontology representations: concept, concept-parent, and concept-children. Through comprehensive evaluations using 20 OM datasets from various domains, we demonstrate that LLMs, under the LLMs4OM framework, can match and even surpass the performance of traditional OM systems, particularly in complex matching scenarios. Our results highlight the potential of LLMs to significantly contribute to the field of OM.

Abstract (translated)

知识集成中的元数据匹配（OM）是一个关键任务，其中对异构知识本体的对齐有助于促进数据互操作性和知识共享。传统的OM系统通常依赖于专家知识或预测模型，对大型语言模型的潜力探索有限。我们提出了LLMs4OM框架，一种评估LLM在OM任务中有效性的新方法。该框架采用两个模块进行检索和匹配，分别通过三个知识表示层的零散提示进行加强：概念、概念父体和概念子体。通过使用各种领域的20个OM数据集进行全面评估，我们证明了LLM在LLMs4OM框架下可以匹配甚至超过传统OM系统的表现，特别是在复杂匹配场景中。我们的结果突出了LLM在OM领域显著贡献的潜力。

URL

https://arxiv.org/abs/2404.10317

PDF

https://arxiv.org/pdf/2404.10317.pdf
Read All
OWLOOP: Interfaces for Mapping OWL Axioms into OOP Hierarchies

2024-04-14 17:07:59

Luca Buoncompagni, Fulvio Mastrogiovanni

arXiv_AI

arXiv_AI Face Ontology
Abstract

The paper tackles the issue of mapping logic axioms formalised in the Ontology Web Language (OWL) within the Object-Oriented Programming (OOP) paradigm. The issues of mapping OWL axioms hierarchies and OOP objects hierarchies are due to OWL-based reasoning algorithms, which might change an OWL hierarchy at runtime; instead, OOP hierarchies are usually defined as static structures. Although programming paradigms based on reflection allow changing the OOP hierarchies at runtime and mapping OWL axioms dynamically, there are no currently available mechanisms that do not limit the reasoning algorithms. Thus, the factory-based paradigm is typically used since it decouples the OWL and OOP hierarchies. However, the factory inhibits OOP polymorphism and introduces a paradigm shift with respect to widely accepted OOP paradigms. We present the OWLOOP API, which exploits the factory to not limit reasoning algorithms, and it provides novel OOP interfaces concerning the axioms in an ontology. OWLOOP is designed to limit the paradigm shift required for using ontologies while improving, through OOP-like polymorphism, the modularity of software architectures that exploit logic reasoning. The paper details our OWL to OOP mapping mechanism, and it shows the benefits and limitations of OWLOOP through examples concerning a robot in a smart environment.

Abstract (translated)

本文研究了在面向对象编程（OOP）范式内，将语义知识图谱（OWL）中的推理规则映射到OWL模型的逻辑轴理问题。OWL轴理层次结构和OOP对象层次结构的映射问题是因为基于OWL的推理算法可能会在运行时改变OWL层次结构；而OOP层次结构通常被定义为静态结构。尽管基于反思的编程范式允许在运行时改变OOP层次结构，并动态地映射OWL轴理，但目前没有可用的机制不限制推理算法。因此，通常是基于工厂的方法，因为它解耦了OWL和OOP层次结构。然而，工厂会抑制OOP多态性，并引入与广泛接受的多范式OOP范式不同的范式转变。我们提出了OWLOOP API，该API利用工厂来避免限制推理算法，并提供了关于语义模型中轴理的新颖OOP接口。OWLOOP旨在通过类似的OOP方式限制使用语义模型的范式转变，同时提高软件架构的模块性，通过逻辑推理来利用。本文详细介绍了我们的OWL到OOP映射机制，并通过一个智能环境中的人工机器人示例，展示了OWLOOP的优势和局限性。

URL

https://arxiv.org/abs/2404.09305

PDF

https://arxiv.org/pdf/2404.09305.pdf
Read All
Small Models Are Effective Cross-Domain Argument Extractors

2024-04-12 16:23:41

William Gantt, Aaron Steven White

arXiv_AI

arXiv_AI QA Ontology Transformer Action Zero-Shot Chat
Abstract

Effective ontology transfer has been a major goal of recent work on event argument extraction (EAE). Two methods in particular -- question answering (QA) and template infilling (TI) -- have emerged as promising approaches to this problem. However, detailed explorations of these techniques' ability to actually enable this transfer are lacking. In this work, we provide such a study, exploring zero-shot transfer using both techniques on six major EAE datasets at both the sentence and document levels. Further, we challenge the growing reliance on LLMs for zero-shot extraction, showing that vastly smaller models trained on an appropriate source ontology can yield zero-shot performance superior to that of GPT-3.5 or GPT-4.

Abstract (translated)

有效的本体转移一直是事件论证提取（EAE）领域最近工作的主要目标。尤其是问答（QA）和模板填充（TI）两种方法——被认为是解决这个问题的有前途的方法。然而，这些技术实际实现这一转移的能力的详细探讨还缺乏。在这项工作中，我们提供了这样的研究，探讨了在句子和文档级别上使用这两种技术进行零散转移。此外，我们还挑战了越来越多地依赖LLM进行零散提取的趋势，证明了在适当的本体架构上训练的小规模模型可以产生与GPT-3.5或GPT-4.0相当甚至更好的零散性能。

URL

https://arxiv.org/abs/2404.08579

PDF

https://arxiv.org/pdf/2404.08579.pdf
Read All
Interactive Ontology Matching with Cost-Efficient Learning

2024-04-11 11:53:14

Bin Cheng, Jonathan Fürst, Tobias Jacobs, Celia Garrido-Hidalgo

arXiv_AI

arXiv_AI Ontology Knowledge Matching
Abstract

The creation of high-quality ontologies is crucial for data integration and knowledge-based reasoning, specifically in the context of the rising data economy. However, automatic ontology matchers are often bound to the heuristics they are based on, leaving many matches unidentified. Interactive ontology matching systems involving human experts have been introduced, but they do not solve the fundamental issue of flexibly finding additional matches outside the scope of the implemented heuristics, even though this is highly demanded in industrial settings. Active machine learning methods appear to be a promising path towards a flexible interactive ontology matcher. However, off-the-shelf active learning mechanisms suffer from low query efficiency due to extreme class imbalance, resulting in a last-mile problem where high human effort is required to identify the remaining matches. To address the last-mile problem, this work introduces DualLoop, an active learning method tailored to ontology matching. DualLoop offers three main contributions: (1) an ensemble of tunable heuristic matchers, (2) a short-term learner with a novel query strategy adapted to highly imbalanced data, and (3) long-term learners to explore potential matches by creating and tuning new heuristics. We evaluated DualLoop on three datasets of varying sizes and domains. Compared to existing active learning methods, we consistently achieved better F1 scores and recall, reducing the expected query cost spent on finding 90% of all matches by over 50%. Compared to traditional interactive ontology matchers, we are able to find additional, last-mile matches. Finally, we detail the successful deployment of our approach within an actual product and report its operational performance results within the Architecture, Engineering, and Construction (AEC) industry sector, showcasing its practical value and efficiency.

Abstract (translated)

高质量本体论的创建对于数据集成和基于知识的推理至关重要，尤其是在数据经济迅速崛起的背景下。然而，自动本体论匹配器通常受到其基于的启发式约束，导致许多匹配无法确定。已经引入了涉及人类专家的交互式本体论匹配系统，但这些系统并未解决实施启发式约束的基本问题，尽管在工业环境中这一点非常重要。积极机器学习方法似乎是通往具有灵活性的交互式本体论匹配器的有望之路。然而，由于极端的类别不平衡，现成的积极学习机制导致查询效率较低，导致最后1公里问题，需要高人类努力来确定剩余的匹配。为解决最后1公里问题，本文引入了DualLoop，一种专为本体论匹配的积极学习方法。DualLoop 带来了三个主要贡献：（1）可调整的启发式匹配器的集合；（2）适应高度不平衡数据的新查询策略；（3）创建并调整新本体论以探索潜在匹配。我们在三个不同规模和领域的数据集上评估了DualLoop。与现有积极学习方法相比，我们始终获得了更好的F1分数和召回，将预计查询成本用于找到90%的匹配降低了50%以上。与传统交互式本体论匹配器相比，我们能够找到额外的最后1公里匹配。最后，我们详细介绍了将我们的方法成功部署在实际产品中的情况，并报告了其在建筑、工程和 Construction（AEC）行业部门中的操作性能结果，展示了其实用价值和效率。

URL

https://arxiv.org/abs/2404.07663

PDF

https://arxiv.org/pdf/2404.07663.pdf
Read All
Building A Knowledge Graph to Enrich ChatGPT Responses in Manufacturing Service Discovery

2024-04-09 18:46:46

Yunqing Li, Binil Starly

arXiv_AI

arXiv_AI Embedding Ontology Knowledge Knowledge_Graph Language_Model Transformer Pose Chat
Abstract

Sourcing and identification of new manufacturing partners is crucial for manufacturing system integrators to enhance agility and reduce risk through supply chain diversification in the global economy. The advent of advanced large language models has captured significant interest, due to their ability to generate comprehensive and articulate responses across a wide range of knowledge domains. However, the system often falls short in accuracy and completeness when responding to domain-specific inquiries, particularly in areas like manufacturing service discovery. This research explores the potential of leveraging Knowledge Graphs in conjunction with ChatGPT to streamline the process for prospective clients in identifying small manufacturing enterprises. In this study, we propose a method that integrates bottom-up ontology with advanced machine learning models to develop a Manufacturing Service Knowledge Graph from an array of structured and unstructured data sources, including the digital footprints of small-scale manufacturers throughout North America. The Knowledge Graph and the learned graph embedding vectors are leveraged to tackle intricate queries within the digital supply chain network, responding with enhanced reliability and greater interpretability. The approach highlighted is scalable to millions of entities that can be distributed to form a global Manufacturing Service Knowledge Network Graph that can potentially interconnect multiple types of Knowledge Graphs that span industry sectors, geopolitical boundaries, and business domains. The dataset developed for this study, now publicly accessible, encompasses more than 13,000 manufacturers' weblinks, manufacturing services, certifications, and location entity types.

Abstract (translated)

采购和识别新制造商合作伙伴对全球经济中的供应链多元化至关重要，这可以提高制造系统集成商的敏捷性，并通过供应链多元化提高风险降低。先进的大型语言模型的出现引起了广泛关注，因为它们能够生成全面且明确的回答，涵盖广泛的领域知识。然而，当回答领域特定问题时，系统往往存在准确性和完整性不足的情况，特别是在制造业服务发现领域。这项研究探讨了在知识图谱与 ChatGPT 的结合下，简化潜在客户在识别小制造企业过程中的可能性。在本研究中，我们提出了一种方法，将自下而上的本体与先进机器学习模型相结合，从包括北美地区小型制造商的数字足迹在内的一系列结构和非结构化数据源中开发出制造业服务知识图。知识图和学到的图嵌入向量被用来处理数字供应链网络中的复杂查询，并回应提高可靠性和增强可解释性的答案。所提出的方法具有可扩展性，可以将数百万实体分配到形成一个全球制造业服务知识网络图，这个网络图可能连接多个跨越行业部门、地理政治边界和企业领域的知识图。为这项研究创建的数据集，现已成为公开可访问的数据库，包括13,000多个制造商网站、制造业服务、认证和位置实体类型。

URL

https://arxiv.org/abs/2404.06571

PDF

https://arxiv.org/pdf/2404.06571.pdf
Read All
Iof-maint -- Modular maintenance ontology

2024-04-08 06:40:03

Melinda Hodkiewicz, Caitlin Woods, Matt Selway, Markus Stumptner

arXiv_AI

arXiv_AI Relation Ontology Action
Abstract

In this paper we present a publicly-available maintenance ontology (Iof-maint). Iof-maint is a modular ontology aligned with the Industrial Ontology Foundry Core (IOF Core) and contains 20 classes and 2 relations. It provides a set of maintenance-specific terms used in a wide variety of practical data-driven use cases. Iof-maint supports OWL DL reasoning, is documented, and is actively maintained on GitHub. In this paper, we describe the evolution of the Iof-maint reference ontology based on the extraction of common concepts identified in a number of application ontologies working with industry maintenance work order, procedure and failure mode data.

Abstract (translated)

在本文中，我们提出了一个公开维护元数据（Iof-maint）。Iof-maint是一个与工业知识库（IOF Core）对齐的模块化元数据，包含20个类和2个关系。它提供了一组用于各种实际数据驱动用例的维护特定术语。Iof-maint支持OWL DL推理，已在GitHub上进行了记录，并正在积极维护。在本文中，我们描述了Iof-maint参考元数据基于从多个应用 ontology 中提取共性概念的演变。

URL

https://arxiv.org/abs/2404.05224

PDF

https://arxiv.org/pdf/2404.05224.pdf
Read All
Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats

2024-04-07 16:35:53

Kunyao Lan, Cong Ming, Binwei Yao, Lu Chen, Mengyue Wu

arXiv_AI

arXiv_AI QA Ontology Optimization Pose Emotion Dialog Chat
Abstract

Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose an innovative ontology definition and generation framework tailored explicitly for depression diagnosis dialogues, combining the reliability of task-oriented conversations with the appeal of empathy-related chit-chat. We further apply the framework to D$^4$, the only existing public dialogue dataset on depression diagnosis-oriented chats. Exhaustive experimental results indicate significant improvements in task completion and emotional support generation in depression diagnosis, fostering a more comprehensive approach to task-oriented chat dialogue system development and its applications in digital mental health.

Abstract (translated)

聊天机器人可以成为通过与潜在患者进行具有交互性的对话进行初步抑郁诊断的可行工具。然而，在诊断相关的对话中混合了任务导向和闲聊，需要专业知识和同理心。这种独特的需求挑战了针对单一优化目标的傳統對話框架。为了应对这个问题，我们提出了一个专门针对抑郁诊断对话的創新本体定义和生成框架，结合了任务导向对话的可靠性和 empathy 相关的闲聊魅力。我们进一步将该框架应用于 D$^4，这是唯一一个关于抑郁诊断聊天数据的公共对话数据集。完整的实验结果表明，在抑郁诊断中，任务完成和情感支持生成的表现都有显著提高，促进了更全面的任务导向聊天对话系统开发和其在数字心理健康领域的应用。

URL

https://arxiv.org/abs/2404.05012

PDF

https://arxiv.org/pdf/2404.05012.pdf
Read All
Large language models as oracles for instantiating ontologies with domain-specific knowledge

2024-04-05 14:04:07

Giovanni Ciatto, Andrea Agiollo, Matteo Magnini, Andrea Omicini

arXiv_AI

arXiv_AI Relation Ontology Knowledge Language_Model Pose
Abstract

Background. Endowing intelligent systems with semantic data commonly requires designing and instantiating ontologies with domain-specific knowledge. Especially in the early phases, those activities are typically performed manually by human experts possibly leveraging on their own experience. The resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. Objective. To mitigate that issue, we propose a novel domain-independent approach to automatically instantiate ontologies with domain-specific knowledge, by leveraging on large language models (LLMs) as oracles. Method. Starting from (i) an initial schema composed by inter-related classes andproperties and (ii) a set of query templates, our method queries the LLM multi- ple times, and generates instances for both classes and properties from its replies. Thus, the ontology is automatically filled with domain-specific knowledge, compliant to the initial schema. As a result, the ontology is quickly and automatically enriched with manifold instances, which experts may consider to keep, adjust, discard, or complement according to their own needs and expertise. Contribution. We formalise our method in general way and instantiate it over various LLMs, as well as on a concrete case study. We report experiments rooted in the nutritional domain where an ontology of food meals and their ingredients is semi-automatically instantiated from scratch, starting from a categorisation of meals and their relationships. There, we analyse the quality of the generated ontologies and compare ontologies attained by exploiting different LLMs. Finally, we provide a SWOT analysis of the proposed method.

Abstract (translated)

背景。为使智能系统获得语义数据，通常需要根据领域专业知识设计并实例化本领域的知识图谱。尤其是在最初阶段，这些活动通常由人类专家手动执行，可能还会利用他们自己的经验。因此， resulting process is therefore time-consuming, error-prone, and often biased by the personal background of the ontology designer. 目标。为了减轻这个问题，我们提出了一种新的、领域无关的方法来自动实例化具有领域特定知识的语义数据，通过利用大型语言模型（LLMs）作为预言者。方法。从(i)一个由相关类和属性组成的初始模式和(ii)一组查询模板开始，我们的方法多次查询LLM，并在其回复中生成类和属性的实例。因此，本语义图自动充满了领域特定知识，符合初始模式。因此，本语义图可以根据专家的需要和专业知识自动丰富多样实例，这些实例可以被视为保留、调整或丢弃。贡献。我们以一般方式形式阐述我们的方法，并在各种LLM上实例化它，同时还在一个具体案例研究中实例化它。我们在营养领域进行实验，从对餐食及其关系的分类开始，一个从零开始的餐食图谱 semi-自动实例化。在那里，我们分析了生成的语义图的质量，并比较了利用不同LLM获得的语义图的质量。最后，我们提供了所提议方法的SWOT分析。

URL

https://arxiv.org/abs/2404.04108

PDF

https://arxiv.org/pdf/2404.04108.pdf
Read All
The Artificial Intelligence Ontology: LLM-assisted construction of AI concept hierarchies

2024-04-03 20:08:15

Marcin P. Joachimiak, Mark A. Miller, J. Harry Caufield, Ryan Ly, Nomi L. Harris, Andrew Tritt, Christopher J. Mungall, Kristofer E. Bouchard

arXiv_AI

arXiv_AI Deep_Learning Relation Ontology Language_Model
Abstract

The Artificial Intelligence Ontology (AIO) is a systematization of artificial intelligence (AI) concepts, methodologies, and their interrelations. Developed via manual curation, with the additional assistance of large language models (LLMs), AIO aims to address the rapidly evolving landscape of AI by providing a comprehensive framework that encompasses both technical and ethical aspects of AI technologies. The primary audience for AIO includes AI researchers, developers, and educators seeking standardized terminology and concepts within the AI domain. The ontology is structured around six top-level branches: Networks, Layers, Functions, LLMs, Preprocessing, and Bias, each designed to support the modular composition of AI methods and facilitate a deeper understanding of deep learning architectures and ethical considerations in AI. AIO's development utilized the Ontology Development Kit (ODK) for its creation and maintenance, with its content being dynamically updated through AI-driven curation support. This approach not only ensures the ontology's relevance amidst the fast-paced advancements in AI but also significantly enhances its utility for researchers, developers, and educators by simplifying the integration of new AI concepts and methodologies. The ontology's utility is demonstrated through the annotation of AI methods data in a catalog of AI research publications and the integration into the BioPortal ontology resource, highlighting its potential for cross-disciplinary research. The AIO ontology is open source and is available on GitHub (this https URL) and BioPortal (this https URL).

Abstract (translated)

人工智能知识图谱（AIO）是一个对人工智能（AI）概念、方法和它们之间相互关系的系统化。AIO是通过手动策展开发起来的，并得到了大型语言模型（LLMs）的额外帮助。它旨在通过提供一个全面涵盖AI技术的技术和道德方面的框架，来应对AI领域快速变化的地形。AIO的主要受众包括AI研究人员、开发人员和教育者，他们寻求在AI领域使用标准化的术语和概念。AIO围绕六个顶级分支展开：网络、层、功能、LLMs、预处理和偏见，每个分支都旨在支持AI方法的模块化组合，并促进对深度学习架构和AI伦理问题的更深入理解。AIO的开发利用了知识图谱开发工具（ODK）进行创建和维护，其内容通过AI驱动的策展支持进行动态更新。这种方法不仅确保了AIO在AI快速发展的大背景下保持其相关性，而且显著提高了研究人员、开发人员和教育者的使用价值，通过简化新AI概念和方法的集成来提高其实用性。AIO的知识有用性通过将AI方法数据注释在AI研究出版物目录中，并将其集成到BioPortal元数据资源中，突出其跨学科研究潜力得到了充分证明。AIO是开源的，可以在GitHub（https://github.com）和BioPortal（https://biportal.org）上获取。

URL

https://arxiv.org/abs/2404.03044

PDF

https://arxiv.org/pdf/2404.03044.pdf
Read All
Event Detection from Social Media for Epidemic Prediction

2024-04-02 06:31:17

Tanmay Parekh, Anh Mac, Jiarui Yu, Yuxuan Dong, Syed Shahriar, Bonnie Liu, Eric Yang, Kuan-Hao Huang, Wei Wang, Nanyun Peng, Kai-Wei Chang

arXiv_CL

arXiv_CL Detection Ontology Prediction Action
Abstract

Social media is an easy-to-access platform providing timely updates about societal trends and events. Discussions regarding epidemic-related events such as infections, symptoms, and social interactions can be crucial for informing policymaking during epidemic outbreaks. In our work, we pioneer exploiting Event Detection (ED) for better preparedness and early warnings of any upcoming epidemic by developing a framework to extract and analyze epidemic-related events from social media posts. To this end, we curate an epidemic event ontology comprising seven disease-agnostic event types and construct a Twitter dataset SPEED with human-annotated events focused on the COVID-19 pandemic. Experimentation reveals how ED models trained on COVID-based SPEED can effectively detect epidemic events for three unseen epidemics of Monkeypox, Zika, and Dengue; while models trained on existing ED datasets fail miserably. Furthermore, we show that reporting sharp increases in the extracted events by our framework can provide warnings 4-9 weeks earlier than the WHO epidemic declaration for Monkeypox. This utility of our framework lays the foundations for better preparedness against emerging epidemics.

Abstract (translated)

社交媒体是一个易于访问的平台，提供关于社会趋势和事件的及时更新。关于传染病相关事件（如感染、症状和社会互动）的讨论对于在疫情爆发期间进行政策制定至关重要。在我们的工作中，我们首创利用事件检测（ED）方法来更好地准备和早期预警即将到来的任何传染病。为此，我们构建了一个由七种疾病无关的事件类型组成的流行事件本体论，并构建了一个人类标注的Twitter数据集SPEED，以关注COVID-19大流行。实验表明，基于COVID的ED模型训练可以有效地检测出三种种类未见过的猴痘、登革热和黄热病等三个新的疫情；而基于现有ED数据集训练的模型则完全无法达到这种效果。此外，我们还证明了通过我们的框架提取的事件报告可以提前4-9周向世界卫生组织（WHO）颁布的疫情声明提供警告。这种基于我们的框架的更好的预防新兴传染病准备工作的基础。

URL

https://arxiv.org/abs/2404.01679

PDF

https://arxiv.org/pdf/2404.01679.pdf
Read All
Recover: A Neuro-Symbolic Framework for Failure Detection and Recovery

2024-03-31 17:54:22

Cristina Cornelio, Mohammed Diab

arXiv_AI

arXiv_AI Detection Ontology Language_Model
Abstract

Recognizing failures during task execution and implementing recovery procedures is challenging in robotics. Traditional approaches rely on the availability of extensive data or a tight set of constraints, while more recent approaches leverage large language models (LLMs) to verify task steps and replan accordingly. However, these methods often operate offline, necessitating scene resets and incurring in high costs. This paper introduces Recover, a neuro-symbolic framework for online failure identification and recovery. By integrating ontologies, logical rules, and LLM-based planners, Recover exploits symbolic information to enhance the ability of LLMs to generate recovery plans and also to decrease the associated costs. In order to demonstrate the capabilities of our method in a simulated kitchen environment, we introduce OntoThor, an ontology describing the AI2Thor simulator setting. Empirical evaluation shows that OntoThor's logical rules accurately detect all failures in the analyzed tasks, and that Recover considerably outperforms, for both failure detection and recovery, a baseline method reliant solely on LLMs.

Abstract (translated)

在机器人领域,在任务执行过程中识别失败并实施恢复程序是非常具有挑战性的。传统方法依赖于大量数据或一组约束条件的可用性,而更现代的方法则利用大型语言模型(LLMs)来验证任务步骤并相应地重新规划。然而,这些方法通常需要离线操作,导致场景重置并产生高昂的成本。本文介绍了一个名为Recover的神经符号框架,用于在线故障识别和恢复。通过整合语义信息、逻辑规则和基于LLM的计划器,Recover利用符号信息增强LLMs生成恢复计划的能力,并降低相关成本。为了在模拟厨房环境中展示我们方法的性能,我们引入了OntoThor,一个描述AI2Thor仿真器设置的语义论。实证评估表明,OntoThor的逻辑规则准确地检测了分析任务中的所有故障,而Recover在故障检测和恢复方面都显著优于仅依赖LLM的基线方法。

URL

https://arxiv.org/abs/2404.00756

PDF

https://arxiv.org/pdf/2404.00756.pdf
Read All

Content

Ontology (20)

Ontology

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL

PDF

Abstract

Abstract (translated)

URL