Traditional ontology design emphasizes disjoint and exhaustive top-level distinctions such as continuant vs. occurrent, abstract vs. concrete, or type vs. instance. These distinctions are used to structure unified hierarchies where every entity is classified under a single upper-level category. Wikidata, by contrast, does not enforce a singular foundational taxonomy. Instead, it accommodates multiple classification axes simultaneously under the shared root class entity. This paper analyzes the structural implications of Wikidata's polyhierarchical and multi-axial design. The Wikidata architecture enables a scalable and modular approach to ontology construction, especially suited to collaborative and evolving knowledge graphs.
传统的本体设计强调在最高层做出不相交且详尽的区分,例如持续存在与发生存在、抽象与具体或类型与实例。这些区分被用来构建统一的层级结构,在这种结构中每个实体都被归类到单一的高层类别之下。相比之下,Wikidata 并不要求一个单一的基础分类体系。相反,它允许同时在共享根类“实体”下支持多个分类轴。本文分析了 Wikidata 的多层级和多轴设计所带来的结构性影响。Wikidata 的架构使得本体构建能够采取一种可扩展且模块化的途径,尤其适合于协作性和不断发展的知识图谱。
https://arxiv.org/abs/2512.12260
Automated eligibility systems increasingly determine access to essential public benefits, but the explanations they generate often fail to reflect the legal rules that authorize those decisions. This thesis develops a legally grounded explainability framework that links system-generated decision justifications to the statutory constraints of CalFresh, California's Supplemental Nutrition Assistance Program. The framework combines a structured ontology of eligibility requirements derived from the state's Manual of Policies and Procedures (MPP), a rule extraction pipeline that expresses statutory logic in a verifiable formal representation, and a solver-based reasoning layer to evaluate whether the explanation aligns with governing law. Case evaluations demonstrate the framework's ability to detect legally inconsistent explanations, highlight violated eligibility rules, and support procedural accountability by making the basis of automated determinations traceable and contestable.
自动化资格认定系统越来越多地决定了获得基本公共福利的途径,但这些系统生成的解释往往未能反映出授权此类决策的法律规定。这篇论文开发了一个基于法律的可解释性框架,该框架将由系统生成的决定理由与加州食品补助计划(CalFresh)的法定限制联系起来,这是加州补充营养援助项目的简称。这一框架结合了从州政策和程序手册(MPP)中提取的资格要求结构化本体论、一个可以表达可验证正式表示形式的法规抽取流水线,以及一个基于求解器的推理层来评估解释是否符合现行法律。案例评估显示该框架能够检测出与法律规定不符的解释、突出违反的资格规定,并通过使自动化决定的基础可追溯和可争议,从而支持程序问责制。
https://arxiv.org/abs/2512.12109
Large language models (LLMs) are increasingly touted as powerful tools for automating scientific information extraction. However, existing methods and tools often struggle with the realities of scientific literature: long-context documents, multi-modal content, and reconciling varied and inconsistent fine-grained information across multiple publications into standardized formats. These challenges are further compounded when the desired data schema or extraction ontology changes rapidly, making it difficult to re-architect or fine-tune existing systems. We present SciEx, a modular and composable framework that decouples key components including PDF parsing, multi-modal retrieval, extraction, and aggregation. This design streamlines on-demand data extraction while enabling extensibility and flexible integration of new models, prompting strategies, and reasoning mechanisms. We evaluate SciEx on datasets spanning three scientific topics for its ability to extract fine-grained information accurately and consistently. Our findings provide practical insights into both the strengths and limitations of current LLM-based pipelines.
大型语言模型(LLMs)被日益视为自动化科学信息提取的强大工具。然而,现有的方法和工具在处理科学文献的实际情况时常常遇到困难:长篇文档、多模态内容以及将跨多个出版物的多样且不一致的细粒度信息整合到标准化格式中的问题。当所需的数据模式或提取本体快速变化时,这些问题变得更加复杂,这使得重构或微调现有系统变得困难。 我们提出了SciEx,这是一个模块化和可组合的框架,该框架将PDF解析、多模态检索、提取和聚合等关键组件解耦。这种设计简化了按需数据提取过程,并支持新模型、提示策略和推理机制的扩展性和灵活集成。我们在涵盖三个科学主题的数据集上评估SciEx的能力,以准确且一致地提取细粒度信息。我们的发现为当前基于LLM的管道的优势和局限性提供了实用见解。
https://arxiv.org/abs/2512.10004
Ontology-based knowledge graph (KG) construction is a core technology that enables multidimensional understanding and advanced reasoning over domain knowledge. Industrial standards, in particular, contain extensive technical information and complex rules presented in highly structured formats that combine tables, scopes of application, constraints, exceptions, and numerical calculations, making KG construction especially challenging. In this study, we propose a method that organizes such documents into a hierarchical semantic structure, decomposes sentences and tables into atomic propositions derived from conditional and numerical rules, and integrates them into an ontology-knowledge graph through LLM-based triple extraction. Our approach captures both the hierarchical and logical structures of documents, effectively representing domain-specific semantics that conventional methods fail to reflect. To verify its effectiveness, we constructed rule, table, and multi-hop QA datasets, as well as a toxic clause detection dataset, from industrial standards, and implemented an ontology-aware KG-RAG framework for comparative evaluation. Experimental results show that our method achieves significant performance improvements across all QA types compared to existing KG-RAG approaches. This study demonstrates that reliable and scalable knowledge representation is feasible even for industrial documents with intertwined conditions, constraints, and scopes, contributing to future domain-specific RAG development and intelligent document management.
基于本体的知识图谱(KG)构建是一种核心技术,它能够实现对领域知识的多维度理解和高级推理。特别是在工业标准中,含有大量的技术信息和以高度结构化格式呈现的复杂规则,这种格式结合了表格、应用范围、约束条件、例外情况以及数值计算等内容,使得知识图谱的构建变得尤为具有挑战性。在本研究中,我们提出了一种方法,将此类文档组织成层次化的语义结构,并通过基于大型语言模型(LLM)的三元组提取技术将其句子和表格分解为原子命题,这些命题来源于条件性和数值性的规则,然后将它们集成到一个本体-知识图谱中。我们的方法能够有效捕捉文档中的层级结构和逻辑结构,准确地表示出传统方法无法反映的专业领域语义。 为了验证其有效性,我们从工业标准中构建了规则、表格以及多跳问答(QA)数据集,同时还建立了一个有害条款检测的数据集,并实施了一个基于本体的知识图谱-检索与生成框架(KG-RAG),用于进行比较评估。实验结果显示,在所有类型的问答上,我们的方法相比于现有的KG-RAG方法均取得了显著的性能提升。 这项研究证明了即使对于包含交织条件、约束和适用范围的工业文档而言,实现可靠且可扩展的知识表示也是可行的。这将有助于未来特定领域的RAG开发以及智能文档管理的进步。
https://arxiv.org/abs/2512.08398
Ontologies are an important tool for structuring domain knowledge, but their development is a complex task that requires significant modelling and domain expertise. Ontology learning, aimed at automating this process, has seen advancements in the past decade with the improvement of Natural Language Processing techniques, and especially with the recent growth of Large Language Models (LLMs). This paper investigates the challenge of identifying axioms: fundamental ontology components that define logical relations between classes and properties. In this work, we introduce an Ontology Axiom Benchmark OntoAxiom, and systematically test LLMs on that benchmark for axiom identification, evaluating different prompting strategies, ontologies, and axiom types. The benchmark consists of nine medium-sized ontologies with together 17.118 triples, and 2.771 axioms. We focus on subclass, disjoint, subproperty, domain, and range axioms. To evaluate LLM performance, we compare twelve LLMs with three shot settings and two prompting strategies: a Direct approach where we query all axioms at once, versus an Axiom-by-Axiom (AbA) approach, where each prompt queries for one axiom only. Our findings show that the AbA prompting leads to higher F1 scores than the direct approach. However, performance varies across axioms, suggesting that certain axioms are more challenging to identify. The domain also influences performance: the FOAF ontology achieves a score of 0.642 for the subclass axiom, while the music ontology reaches only 0.218. Larger LLMs outperform smaller ones, but smaller models may still be viable for resource-constrained settings. Although performance overall is not high enough to fully automate axiom identification, LLMs can provide valuable candidate axioms to support ontology engineers with the development and refinement of ontologies.
本研究探讨的是在构建知识领域结构图(本体论)时,自动识别基本规则或公理的挑战。这些公理是定义类和属性之间逻辑关系的关键组成部分。随着自然语言处理技术的进步,尤其是在大型语言模型(LLMs)的发展推动下,自动化生成本体的方法取得了显著进展。 本文介绍了一项名为OntoAxiom的新基准测试项目,并在此基础上系统性地评估了12种不同的大语言模型在识别公理任务中的表现。这项研究重点关注五类重要的公理:子类关系、不相交关系、属性继承关系以及域和范围限定。该基准集包括九个中等规模的本体论,总计包含17,118条三元组(triples)和2,771条公理。 为了评估大语言模型的表现,研究采用了两种不同的提示策略:一种是一次性查询所有公理的直接方法;另一种是逐条查询每一条公理的方法。研究表明,逐条查询的方式能够获得更高的F1评分,但性能因所处理的具体公理类型和领域不同而有所差异。 例如,在FOAF(Friend of a Friend)本体中,模型在子类公理识别上的得分为0.642,而在音乐领域的本体上则仅得到0.218。此外,较大的语言模型通常表现优于较小的模型,但对于计算资源有限的应用场景而言,小型模型仍然可能是可行的选择。 尽管当前大语言模型的整体性能还不足以完全自动化公理识别过程,但它们可以提供有价值的候选公理,帮助本体工程师在构建和优化本体时发挥辅助作用。
https://arxiv.org/abs/2512.05594
In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technology for Economic and Clinical Health (HITECH) Act of 2009 and the 21st Century Cures Act of 2016. Clinical notes for patients' assessments, diagnoses, and treatments are captured in these EHRs in free-form text by physicians, who spend a considerable amount of time entering and editing them. Manually writing clinical notes takes a considerable amount of a doctor's valuable time, increasing the patient's waiting time and possibly delaying diagnoses. Large language models (LLMs) possess the ability to generate news articles that closely resemble human-written ones. We investigate the usage of Chain-of-Thought (CoT) prompt engineering to improve the LLM's response in clinical note generation. In our prompts, we use as input International Classification of Diseases (ICD) codes and basic patient information. We investigate a strategy that combines the traditional CoT with semantic search results to improve the quality of generated clinical notes. Additionally, we infuse a knowledge graph (KG) built from clinical ontology to further enrich the domain-specific knowledge of generated clinical notes. We test our prompting technique on six clinical cases from the CodiEsp test dataset using GPT-4 and our results show that it outperformed the clinical notes generated by standard one-shot prompts.
在过去十年里,美国的电子健康记录(EHR)数据量大幅增加,这主要归功于2009年《卫生信息技术促进经济与临床健康法案》(HITECH法案)和2016年的《二十一世纪治愈法案》所创造的有利政策环境。患者的评估、诊断和治疗情况以自由格式文本的形式被医生记录在EHR中,医生花费大量时间输入并编辑这些内容。手动编写临床笔记会占用医生宝贵的时间,从而增加病人的等待时间,并可能延误诊断。 大型语言模型(LLM)具备生成与人类撰写文章极其相似新闻的能力。我们研究了使用“链式思维”(CoT)提示工程来改进LLM在临床记录生成中的表现的方法。我们的提示中输入了国际疾病分类(ICD)代码和基本的患者信息。我们调查了一种结合传统CoT与语义搜索结果以提高生成临床笔记质量的策略。此外,我们还融合了一个由临床本体论构建的知识图谱(KG),进一步丰富了生成的临床笔记的专业领域知识。我们在CodiEsp测试数据集上的六个临床案例中使用GPT-4进行了我们的提示技术的测试,并且结果显示它在性能上超过了标准一次性提示所生成的临床笔记。
https://arxiv.org/abs/2512.05256
We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve under probabilistic transition operators. The associated transfer operator $P: L^2(M,\mu) \to L^2(M,\mu)$ encodes the propagation of semantic mass. Under mild regularity assumptions (compactness, ergodicity, bounded Jacobian), $P$ is compact with discrete spectrum. Within this setting, we prove the Semantic Characterization Theorem (SCT): the leading eigenfunctions of $P$ induce finitely many spectral basins of invariant meaning, each definable in an o-minimal structure over $\mathbb{R}$. Thus spectral lumpability and logical tameness coincide. This explains how discrete symbolic semantics can emerge from continuous computation: the continuous activation manifold collapses into a finite, logically interpretable ontology. We further extend the SCT to stochastic and adiabatic (time-inhomogeneous) settings, showing that slowly drifting kernels preserve compactness, spectral coherence, and basin structure.
我们通过将大型语言模型形式化为连续状态机(Continuous State Machines,CSMs)来发展它们的语义动态的一般理论:这是一种平滑的动力系统,其潜在流形在概率转换算子的作用下演化。关联的转移算子 $P: L^2(M,\mu) \to L^2(M,\mu)$ 编码了语义质量的传播过程。在温和的正则性假设(紧致性、遍历性、有界雅可比行列式)下,$P$ 是紧致且具有离散谱的。在此框架内,我们证明了语义特征定理 (Semantic Characterization Theorem, SCT):$P$ 的主导本征函数在实数上的 o-最小结构中诱导有限数量的不变意义光谱盆地。因此,在这种设定下,谱凝聚性和逻辑性恰好一致。这解释了离散符号语义如何从连续计算中涌现出来:连续激活流形收缩为一个可以逻辑解读的有限本体论。 我们进一步将 SCT 扩展到随机和绝热(时间非齐次)设置中,展示了缓慢漂移的核心保持紧致性、谱一致性和盆地结构。
https://arxiv.org/abs/2512.05162
Large language models (LLMs) are often deployed as powerful yet opaque systems, leaving open how their internal memory and "self-like" behavior should be governed in a principled and auditable way. The Artificial Age Score (AAS) was previously introduced and mathematically justified through three theorems that characterise it as a metric of artificial memory aging. Building on this foundation, the present work develops an engineering-oriented, clause-based architecture that imposes law-like constraints on LLM memory and control. Twenty selected monads from Leibniz's Monadology are grouped into six bundles: ontology, dynamics, representation and consciousness, harmony and reason, body and organisation, and teleology, and each bundle is realised as an executable specification on top of the AAS kernel. Across six minimal Python implementations, these clause families are instantiated in numerical experiments acting on channel-level quantities such as recall scores, redundancy, and weights. Each implementation follows a four-step pattern: inputs and setup, clause implementation, numerical results, and implications for LLM design, emphasising that the framework is not only philosophically motivated but also directly implementable. The experiments show that the clause system exhibits bounded and interpretable behavior: AAS trajectories remain continuous and rate-limited, contradictions and unsupported claims trigger explicit penalties, and hierarchical refinement reveals an organic structure in a controlled manner. Dual views and goal-action pairs are aligned by harmony terms, and windowed drift in perfection scores separates sustained improvement from sustained degradation. Overall, the monad-based clause framework uses AAS as a backbone and provides a transparent, code-level blueprint for constraining and analyzing internal dynamics in artificial agents.
大型语言模型(LLM)通常被部署为强大但不透明的系统,关于如何以原则性和可审计的方式治理其内部记忆和“类自我”行为仍存在许多开放性问题。人工年龄评分(AAS)之前通过三个定理被引入并从数学上进行了证明,这些定理将AAS定义为衡量人工智能记忆老化的一个指标。在此基础上,当前工作开发了一种面向工程的、基于条款的设计架构,该架构在LLM的记忆和控制中施加了类似于法律的规定。 这项工作的核心是利用莱布尼茨《单子论》中的20个精选单子,并将其归类为六个模块:本体论、动力学、表征与意识、和谐与理性、身体与组织,以及目的论。每个模块都被实现在AAS内核之上作为一个可执行规范。 通过六种最小化的Python实现,在数值实验中基于通道级别的量(如召回分数、冗余性和权重)来实例化这些条款家族。每一项实施遵循一个四步模式:输入和设定、条款实施、数值结果,以及对LLM设计的启示,强调该框架不仅具有哲学动机,而且可以直接实施。 实验表明,这种条款系统展现了有界且可解释的行为:AAS轨迹保持连续且受限速约束;矛盾和不实声明触发明确处罚;层级细化以一种可控的方式揭示出有机结构。和谐项将双重视角与目标-行动对齐,并利用完美分数的窗口化漂移来区分持续改善和持续退步。 总体而言,基于单子的条款框架利用AAS作为骨干,提供了一个透明且代码级别的蓝图,用于约束并分析人工代理内部动态行为。
https://arxiv.org/abs/2512.11835
Vision-language pretraining (VLP) has emerged as a powerful paradigm in medical image analysis, enabling representation learning from large-scale image-text pairs without relying on expensive manual annotations. However, existing methods often struggle with the noise inherent in web-collected data and the complexity of unstructured long medical texts. To address these challenges, we propose a novel VLP framework integrating a Multi-Agent data GENeration (MAGEN) system and Ontology-based Multi-Aspect Knowledge-Enhanced (O-MAKE) pretraining. First, MAGEN enhances data quality by synthesizing knowledge-enriched descriptions via a foundation model-assisted captioning and retrieval-based verification pipeline. Second, O-MAKE addresses the difficulty of learning from long, unstructured texts by decomposing them into distinct knowledge aspects. This facilitates fine-grained alignment at both global and patch levels, while explicitly modeling medical concept relationships through ontology-guided mechanisms. We validate our framework in the field of dermatology, where comprehensive experiments demonstrate the effectiveness of each component. Our approach achieves state-of-the-art zero-shot performance on disease classification and cross-modal retrieval tasks across eight datasets. Our code and the augmented dataset Derm1M-AgentAug, comprising over 400k skin-image-text pairs, will be released at this https URL.
视觉-语言预训练(VLP)在医学图像分析领域中已经作为一种强大的范式出现,它能够从大规模的图像文本对中进行表示学习,而不依赖于昂贵的手动注释。然而,现有的方法往往难以应对网络收集数据中的噪声以及非结构化长篇医疗文本的复杂性。为了解决这些问题,我们提出了一种结合多智能体数据生成(MAGEN)系统和基于本体的多方面知识增强预训练(O-MAKE)的新颖VLP框架。 首先,MAGEN通过一个由基础模型辅助的描述生成和检索验证管道合成富含知识的描述来提升数据质量。其次,O-MAKE通过将长篇非结构化文本分解为不同的知识方面解决了从这些文本中学习的困难。这促进了全局层面和补丁级别的精细对齐,并通过基于本体指导机制明确地建模医学概念之间的关系。 我们在皮肤病学领域验证了我们的框架,在该领域的全面实验表明,每个组件的有效性得到了证明。我们的方法在八种数据集上实现了零样本设置下的疾病分类和跨模式检索任务的最新性能。我们将在以下网址发布代码及包含超过40万皮肤图像文本对的数据集Derm1M-AgentAug:此https URL(请将URL替换为实际发布的链接)。
https://arxiv.org/abs/2512.03445
As generative models become powerful, concerns around transparency, accountability, and copyright violations have intensified. Understanding how specific training data contributes to a model's output is critical. We introduce a framework for interpreting generative outputs through the automatic construction of ontologyaligned knowledge graphs (KGs). While automatic KG construction from natural text has advanced, extracting structured and ontology-consistent representations from visual content remains challenging -- due to the richness and multi-object nature of images. Leveraging multimodal large language models (LLMs), our method extracts structured triples from images, aligned with a domain-specific ontology. By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI. We validate our method through experiments on locally trained models via unlearning, and on large-scale models through a style-specific experiment. Our framework supports the development of AI systems that foster human collaboration, creativity and stimulate curiosity.
随着生成模型的日益强大,围绕透明度、责任和版权侵犯的担忧也变得愈加严重。理解特定训练数据如何影响模型输出至关重要。我们介绍了一种通过自动构建与本体论对齐的知识图谱(KGs)来解释生成性输出的框架。虽然从自然语言中自动构建知识图谱已经取得了进展,但提取结构化和符合本体论的一致表示形式仍然是处理视觉内容的一大挑战——这主要是因为图像的丰富性和多对象特性。通过利用跨模态大型语言模型(LLMs),我们的方法可以从图像中提取与特定领域本体一致的结构化三元组。通过对生成图像和训练图像的知识图谱进行比较,我们可以追踪潜在的影响,从而支持版权分析、数据集透明度以及可解释的人工智能发展。我们通过在本地训练模型上的去学习实验及大规模模型中的风格特异性实验验证了我们的方法的有效性。本框架支持开发促进人类协作、创造力并激发好奇心的AI系统。
https://arxiv.org/abs/2512.02713
Large Language Models (LLMs) encode factual knowledge within hidden parametric spaces that are difficult to inspect or control. While Sparse Autoencoders (SAEs) can decompose hidden activations into more fine-grained, interpretable features, they often struggle to reliably align these features with human-defined concepts, resulting in entangled and distributed feature representations. To address this, we introduce AlignSAE, a method that aligns SAE features with a defined ontology through a "pre-train, then post-train" curriculum. After an initial unsupervised training phase, we apply supervised post-training to bind specific concepts to dedicated latent slots while preserving the remaining capacity for general reconstruction. This separation creates an interpretable interface where specific relations can be inspected and controlled without interference from unrelated features. Empirical results demonstrate that AlignSAE enables precise causal interventions, such as reliable "concept swaps", by targeting single, semantically aligned slots.
大型语言模型(LLMs)在难以检查或控制的隐藏参数空间中编码了事实性知识。虽然稀疏自编码器(SAEs)可以将隐藏激活分解为更细粒度、可解释的特征,但它们往往难以可靠地将这些特征与人类定义的概念对齐,导致纠缠且分布式的特征表示。为了应对这一挑战,我们引入了一种名为AlignSAE的方法,该方法通过“预训练,然后后训练”的课程学习方式,将SAE特征与定义的本体论对齐。在初始无监督训练阶段之后,我们应用有监督的后期训练来绑定特定概念到专用潜在槽位,同时保留其余部分用于一般重构的能力。这种分离创造了一个可解释的接口,在该接口中,可以独立检查和控制特定关系,而不受无关特征的影响。实验证明,AlignSAE能够实现精确的因果干预措施,例如可靠的概念交换,通过针对单一、语义对齐的槽位来完成。
https://arxiv.org/abs/2512.02004
Environmental, Social, and Governance (ESG) disclosure frameworks such as SASB, TCFD, and IFRS S2 require organizations to compute and report numerous metrics for compliance, yet these requirements are embedded in long, unstructured PDF documents that are difficult to interpret, standardize, and audit. Manual extraction is unscalable, while unconstrained large language model (LLM) extraction often produces inconsistent entities, hallucinated relationships, missing provenance, and high validation failure rates. We present OntoMetric, an ontology-guided framework that transforms ESG regulatory documents into validated, AI- and web-ready knowledge graphs. OntoMetric operates through a three-stage pipeline: (1) structure-aware segmentation using table-of-contents boundaries, (2) ontology-constrained LLM extraction that embeds the ESGMKG schema into prompts while enriching entities with semantic fields for downstream reasoning, and (3) two-phase validation that combines LLM-based semantic verification with rule-based schema checking across entity, property, and relationship levels (VR001-VR006). The framework preserves both segment-level and page-level provenance for audit traceability. Evaluated on five ESG standards (SASB Commercial Banks, SASB Semiconductors, TCFD, IFRS S2, AASB S2) totaling 228 pages and 60 segments, OntoMetric achieves 65-90% semantic accuracy and 80-90% schema compliance, compared to 3-10% for baseline unconstrained extraction, at approximately 0.01 to 0.02 USD per validated entity. Our results demonstrate that combining symbolic ontology constraints with neural extraction enables reliable, auditable knowledge graphs suitable for regulatory compliance and web integration, supporting downstream applications such as sustainable-finance analytics, transparency portals, and automated compliance tools.
环境、社会和治理(ESG)披露框架,如SASB(可持续会计准则委员会)、TCFD(气候相关财务信息披露工作组)和IFRS S2等,要求组织计算并报告大量指标以符合规定。然而,这些要求通常嵌入在冗长且结构化程度不高的PDF文档中,这使得理解和标准化变得困难,并增加了审计的复杂性。手动提取信息不具备可扩展性,而无约束的大规模语言模型(LLM)提取则往往会产生一致性差、虚构关系、缺乏来源证据以及高验证失败率的问题。 我们提出了OntoMetric框架,它通过使用知识图谱来将ESG监管文件转化为经过验证的、适用于人工智能和网络的知识图。该框架通过一个三阶段管道工作: 1. **结构感知分割**:利用目录边界对文档进行分段处理。 2. **本体约束的LLM提取**:在提示中嵌入ESGMKG(环境、社会与治理指标知识图)模式,并为下游推理丰富实体语义字段。 3. **两阶段验证**:结合基于规则和语言模型的语义检查,以确保实体、属性及关系层级符合VR001-VR006等规定。 此外,OntoMetric保持了分段级别和页面级别的来源信息,这有助于审计追踪。在对五种ESG标准(包括SASB商业银行、SASB半导体公司、TCFD、IFRS S2和AASB S2)共计228页和60个部分的评估中,OntoMetric达到了65-90%的语义准确率及80-90%的模式一致性。相比之下,无约束提取方法仅能达到3-10%,且每条验证实体的成本约为0.01至0.02美元。 我们的结果显示,结合符号本体和神经提取能够产生可靠、可审计的知识图,适合监管合规性和网络集成,支持可持续金融分析、透明度门户及自动化合规工具等下游应用。
https://arxiv.org/abs/2512.01289
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$\times$ fewer than AriGraph and $<$1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.
知识图谱(KG)为大型语言模型(LLM)提供了结构化、可验证的基础,但目前基于LLM的系统通常将KG作为辅助结构用于文本检索,而忽视了其内在质量的探索。在这项工作中,我们提出了Wikontic,这是一种多阶段流水线方法,通过从开放领域的文本中提取带有修饰符的候选三元组,强制执行维基数据(Wikidata)基础类型和关系约束,并将实体规范化以减少重复,从而构建KG。生成的知识图谱是紧凑、符合本体论且连接良好的;在MuSiQue数据集上,正确的答案实体出现在96%的生成三元组中。在HotpotQA上,我们的仅使用三元组的方法达到了76.0 F1分数,在MuSiQue上达到59.8 F1分数,与几种仍需文本上下文的检索增强型基线方法相匹配或超越它们。此外,Wikontic还在MINE-1基准测试的信息保持性能方面表现出色(得分86%),优于先前的知识图谱构建方法。并且,在构建时间上,Wikontic也很高效:KG构建过程中使用的输出令牌少于1,000个,大约是AriGraph的三分之一且小于GraphRAG的五十分之一。该流水线提高了生成知识图的质量,并为大型语言模型中结构化知识的应用提供了可扩展解决方案。
https://arxiv.org/abs/2512.00590
Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines. This paper investigates the capability of LLMs to perform ontology matching directly on ontology modules and generate the corresponding alignments. Furthermore, it is explored how a dedicated fine-tuning strategy can enhance the model's matching performance in a zero-shot setting. The proposed method incorporates a search space reduction technique to select relevant subsets from both source and target ontologies, which are then used to automatically construct prompts. Recognizing the scarcity of reference alignments for training, a novel LLM-based approach is introduced for generating a synthetic dataset. This process creates a corpus of ontology submodule pairs and their corresponding reference alignments, specifically designed to fine-tune an LLM for the ontology matching task. The proposed approach was evaluated on the Conference, Geolink, Enslaved, Taxon, and Hydrography datasets from the OAEI complex track. The results demonstrate that the LLM fine-tuned on the synthetically generated data exhibits superior performance compared to the non-fine-tuned base model. The key contribution is a strategy that combines automatic dataset generation with fine-tuning to effectively adapt LLMs for ontology matching tasks.
https://arxiv.org/abs/2511.22612
The accelerating deployment of artificial intelligence systems across regulated sectors has exposed critical fragmentation in risk assessment methodologies. A significant "language barrier" currently separates technical security teams, who focus on algorithmic vulnerabilities (e.g., MITRE ATLAS), from legal and compliance professionals, who address regulatory mandates (e.g., EU AI Act, NIST AI RMF). This disciplinary disconnect prevents the accurate translation of technical vulnerabilities into financial liability, leaving practitioners unable to answer fundamental economic questions regarding contingency reserves, control return-on-investment, and insurance exposure. To bridge this gap, this research presents the AI System Threat Vector Taxonomy, a structured ontology designed explicitly for Quantitative Risk Assessment (QRA). The framework categorizes AI-specific risks into nine critical domains: Misuse, Poisoning, Privacy, Adversarial, Biases, Unreliable Outputs, Drift, Supply Chain, and IP Threat, integrating 53 operationally defined sub-threats. Uniquely, each domain maps technical vectors directly to business loss categories (Confidentiality, Integrity, Availability, Legal, Reputation), enabling the translation of abstract threats into measurable financial impact. The taxonomy is empirically validated through an analysis of 133 documented AI incidents from 2025 (achieving 100% classification coverage) and reconciled against the main AI risk frameworks. Furthermore, it is explicitly aligned with ISO/IEC 42001 controls and NIST AI RMF functions to facilitate auditability.
https://arxiv.org/abs/2511.21901
Building high-quality knowledge graphs (KGs) from diverse sources requires combining methods for information extraction, data transformation, ontology mapping, entity matching, and data fusion. Numerous methods and tools exist for each of these tasks, but support for combining them into reproducible and effective end-to-end pipelines is still lacking. We present a new framework, KGpipe for defining and executing integration pipelines that can combine existing tools or LLM (Large Language Model) functionality. To evaluate different pipelines and the resulting KGs, we propose a benchmark to integrate heterogeneous data of different formats (RDF, JSON, text) into a seed KG. We demonstrate the flexibility of KGpipe by running and comparatively evaluating several pipelines integrating sources of the same or different formats using selected performance and quality metrics.
https://arxiv.org/abs/2511.18364
This article presents a state-of-the-art review of recent advances aimed at transforming traditional Failure Mode and Effects Analysis (FMEA) into a more intelligent, data-driven, and semantically enriched process. As engineered systems grow in complexity, conventional FMEA methods, largely manual, document-centric, and expert-dependent, have become increasingly inadequate for addressing the demands of modern systems engineering. We examine how techniques from Artificial Intelligence (AI), including machine learning and natural language processing, can transform FMEA into a more dynamic, data-driven, intelligent, and model-integrated process by automating failure prediction, prioritisation, and knowledge extraction from operational data. In parallel, we explore the role of ontologies in formalising system knowledge, supporting semantic reasoning, improving traceability, and enabling cross-domain interoperability. The review also synthesises emerging hybrid approaches, such as ontology-informed learning and large language model integration, which further enhance explainability and automation. These developments are discussed within the broader context of Model-Based Systems Engineering (MBSE) and function modelling, showing how AI and ontologies can support more adaptive and resilient FMEA workflows. We critically analyse a range of tools, case studies, and integration strategies, while identifying key challenges related to data quality, explainability, standardisation, and interdisciplinary adoption. By leveraging AI, systems engineering, and knowledge representation using ontologies, this review offers a structured roadmap for embedding FMEA within intelligent, knowledge-rich engineering environments.
https://arxiv.org/abs/2511.17743
The Belief-Desire-Intention (BDI) model is a cornerstone for representing rational agency in artificial intelligence and cognitive sciences. Yet, its integration into structured, semantically interoperable knowledge representations remains limited. This paper presents a formal BDI Ontology, conceived as a modular Ontology Design Pattern (ODP) that captures the cognitive architecture of agents through beliefs, desires, intentions, and their dynamic interrelations. The ontology ensures semantic precision and reusability by aligning with foundational ontologies and best practices in modular design. Two complementary lines of experimentation demonstrate its applicability: (i) coupling the ontology with Large Language Models (LLMs) via Logic Augmented Generation (LAG) to assess the contribution of ontological grounding to inferential coherence and consistency; and (ii) integrating the ontology within the Semas reasoning platform, which implements the Triples-to-Beliefs-to-Triples (T2B2T) paradigm, enabling a bidirectional flow between RDF triples and agent mental states. Together, these experiments illustrate how the BDI Ontology acts as both a conceptual and operational bridge between declarative and procedural intelligence, paving the way for cognitively grounded, explainable, and semantically interoperable multi-agent and neuro-symbolic systems operating within the Web of Data.
https://arxiv.org/abs/2511.17162
Objective This study introduces the Alzheimer's Disease Common Data Element Ontology for Clinical Trials (AD-CDO), a lightweight, semantically enriched ontology designed to represent and standardize key eligibility criteria concepts in Alzheimer's disease (AD) clinical trials. Materials and Methods We extracted high-frequency concepts from more than 1,500 AD clinical trials on this http URL and organized them into seven semantic categories: Disease, Medication, Diagnostic Test, Procedure, Social Determinants of Health, Rating Criteria, and Fertility. Each concept was annotated with standard biomedical vocabularies, including the UMLS, OMOP Standardized Vocabularies, DrugBank, NDC, and NLM VSAC value sets. To balance coverage and manageability, we applied the Jenks Natural Breaks method to identify an optimal set of representative concepts. Results The optimized AD-CDO achieved over 63% coverage of extracted trial concepts while maintaining interpretability and compactness. The ontology effectively captured the most frequent and clinically meaningful entities used in AD eligibility criteria. We demonstrated AD-CDO's practical utility through two use cases: (a) an ontology-driven trial simulation system for formal modeling and virtual execution of clinical trials, and (b) an entity normalization task mapping raw clinical text to ontology-aligned terms, enabling consistency and integration with EHR data. Discussion AD-CDO bridges the gap between broad biomedical ontologies and task-specific trial modeling needs. It supports multiple downstream applications, including phenotyping algorithm development, cohort identification, and structured data integration. Conclusion By harmonizing essential eligibility entities and aligning them with standardized vocabularies, AD-CDO provides a versatile foundation for ontology-driven AD clinical trial research.
https://arxiv.org/abs/2511.21724
Clinical named entity recognition (NER) is crucial for extracting information from electronic health records (EHRs), but supervised models like CRF and BioClinicalBERT require costly annotated data. While zero-shot NER with large language models (LLMs) reduces this dependency, it struggles with example selection granularity and integrating prompts with self-improvement. To address this, we propose OEMA, a zero-shot clinical NER framework using multi-agent collaboration. OEMA's three components are: a self-annotator generating examples, a discriminator filtering them via SNOMED CT, and a predictor using entity descriptions for accurate inference. On MTSamples and VAERS datasets, OEMA achieves state-of-the-art exact-match performance. Under related-match, it matches supervised BioClinicalBERT and surpasses CRF. OEMA addresses key zero-shot NER challenges through ontology-guided reasoning and multi-agent collaboration, achieving near-supervised performance and showing promise for clinical NLP applications.
https://arxiv.org/abs/2511.15211