The rapid advancement of transformer-based language models has catalyzed breakthroughs in biomedical and clinical natural language processing; however, plant science remains markedly underserved by such domain-adapted tools. In this work, we present PlantBert, a high-performance, open-source language model specifically tailored for extracting structured knowledge from plant stress-response literature. Built upon the DeBERTa architecture-known for its disentangled attention and robust contextual encoding-PlantBert is fine-tuned on a meticulously curated corpus of expert-annotated abstracts, with a primary focus on lentil (Lens culinaris) responses to diverse abiotic and biotic stressors. Our methodology combines transformer-based modeling with rule-enhanced linguistic post-processing and ontology-grounded entity normalization, enabling PlantBert to capture biologically meaningful relationships with precision and semantic fidelity. The underlying corpus is annotated using a hierarchical schema aligned with the Crop Ontology, encompassing molecular, physiological, biochemical, and agronomic dimensions of plant adaptation. PlantBert exhibits strong generalization capabilities across entity types and demonstrates the feasibility of robust domain adaptation in low-resource scientific fields. By providing a scalable and reproducible framework for high-resolution entity recognition, PlantBert bridges a critical gap in agricultural NLP and paves the way for intelligent, data-driven systems in plant genomics, phenomics, and agronomic knowledge discovery. Our model is publicly released to promote transparency and accelerate cross-disciplinary innovation in computational plant science.
基于变压器的语言模型的快速进步已经推动了生物医学和临床自然语言处理领域的突破;然而,植物科学领域仍然明显缺乏此类专门化的工具支持。在此项工作中,我们介绍了PlantBert,这是一个高性能、开源的语言模型,特别针对从植物应激反应文献中提取结构化知识而设计。PlantBert基于DeBERTa架构构建,该架构以其分离的注意力机制和强大的上下文编码著称,并在经过精心策划的、由专家标注摘要组成的语料库上进行了微调,其主要关注的是豌豆(Lens culinaris)对各种非生物性和生物性应激因子的响应。我们的方法结合了基于变压器的建模与规则增强的语言后处理和本体导向的实体规范化技术,使PlantBert能够以高精度和语义准确性捕捉生物学相关的关联关系。基础语料库使用层次化方案进行了标注,该方案与作物本体相一致,涵盖了植物适应性的分子、生理学、生化及农艺维度。PlantBert在不同类型的实体中表现出强大的泛化能力,并证明了低资源科学领域稳健的领域适应性是可行的。通过提供可扩展且可重复的高分辨率实体识别框架,PlantBert填补了农业NLP中的一个关键空白,并为植物基因组学、表型组学和农艺知识发现领域的智能数据驱动系统开辟了道路。我们的模型向公众发布以促进透明度并加速计算植物科学跨学科创新的步伐。
https://arxiv.org/abs/2506.08897
This study explores the neural and behavioral consequences of LLM-assisted essay writing. Participants were divided into three groups: LLM, Search Engine, and Brain-only (no tools). Each completed three sessions under the same condition. In a fourth session, LLM users were reassigned to Brain-only group (LLM-to-Brain), and Brain-only users were reassigned to LLM condition (Brain-to-LLM). A total of 54 participants took part in Sessions 1-3, with 18 completing session 4. We used electroencephalography (EEG) to assess cognitive load during essay writing, and analyzed essays using NLP, as well as scoring essays with the help from human teachers and an AI judge. Across groups, NERs, n-gram patterns, and topic ontology showed within-group homogeneity. EEG revealed significant differences in brain connectivity: Brain-only participants exhibited the strongest, most distributed networks; Search Engine users showed moderate engagement; and LLM users displayed the weakest connectivity. Cognitive activity scaled down in relation to external tool use. In session 4, LLM-to-Brain participants showed reduced alpha and beta connectivity, indicating under-engagement. Brain-to-LLM users exhibited higher memory recall and activation of occipito-parietal and prefrontal areas, similar to Search Engine users. Self-reported ownership of essays was the lowest in the LLM group and the highest in the Brain-only group. LLM users also struggled to accurately quote their own work. While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI's role in learning.
这项研究探讨了大型语言模型(LLM)辅助写作对神经和行为的影响。参与者被分为三组:LLM组、搜索引擎组和仅用大脑(不使用工具)的Brain-only组。每组在相同条件下完成了三个阶段的任务,并在第四个阶段进行了重新分组,即LLM用户转为Brain-only组(LLM-to-Brain),而Brain-only用户则被分配到LLM条件中(Brain-to-LLM)。总共有54名参与者参加了前三个阶段,其中18人完成了第四阶段。 我们使用脑电图(EEG)评估了在撰写论文时的认知负荷,并通过自然语言处理技术分析了论文内容。我们也借助人类教师和人工智能评判系统的帮助给作文打分。各组之间的命名实体识别、n-gram模式和主题语义结构显示出了内部一致性。EEG结果显示大脑连接存在显著差异:仅用大脑写作的参与者表现出最强大且分布广泛的网络;搜索引擎用户展示了中等水平的参与度;而使用LLM的用户的神经连接最为薄弱。 认知活动随着外部工具的使用而减少。在第四个阶段,由LLM转到Brain-only组的参与者显示出较低的alpha和beta波段神经连接性,这表明他们的大脑处于相对不活跃的状态。相比之下,从仅用大脑写作转换为使用LLM的人表现出更高的记忆召回率以及枕叶-顶叶和前额区域的活动水平,类似于搜索引擎用户的特征。自我报告中,LLM组参与者对其作品的拥有感最低,而Brain-only组最高。 此外,LLM用户在引用自己之前的工作时遇到了困难。虽然大型语言模型提供了即时便利性,但我们的研究结果揭示了潜在的认知成本。经过四个月的研究,使用LLM的参与者在神经、语言和行为层面均表现不佳。这些结果引发了对长期依赖于大型语言模型的教育影响的关注,并强调了需要进一步探究人工智能在学习中的角色。
https://arxiv.org/abs/2506.08872
Having a unified, coherent taxonomy is essential for effective knowledge representation in domain-specific applications as diverse terminologies need to be mapped to underlying concepts. Traditional manual approaches to taxonomy alignment rely on expert review of concept pairs, but this becomes prohibitively expensive and time-consuming at scale, while subjective interpretations often lead to expert disagreements. Existing automated methods for taxonomy alignment have shown promise but face limitations in handling nuanced semantic relationships and maintaining consistency across different domains. These approaches often struggle with context-dependent concept mappings and lack transparent reasoning processes. We propose a novel framework that combines large language models (LLMs) with expert calibration and iterative prompt optimization to automate taxonomy alignment. Our method integrates expert-labeled examples, multi-stage prompt engineering, and human validation to guide LLMs in generating both taxonomy linkages and supporting rationales. In evaluating our framework on a domain-specific mapping task of concept essentiality, we achieved an F1-score of 0.97, substantially exceeding the human benchmark of 0.68. These results demonstrate the effectiveness of our approach in scaling taxonomy alignment while maintaining high-quality mappings and preserving expert oversight for ambiguous cases.
拥有统一且连贯的分类体系对于特定领域的知识表示至关重要,因为需要将不同的术语映射到基础概念上。传统的手动方法依赖于专家对概念配对进行审查,但这在大规模应用时变得过于昂贵和耗时,并且主观解释往往会导致专家之间的分歧。现有的自动化分类法对齐方法显示出一定的潜力,但面对处理细微语义关系以及跨不同领域保持一致性方面存在局限性。这些方法通常难以应对基于上下文的概念映射问题,并缺乏透明的推理过程。 我们提出了一种新的框架,该框架结合了大规模语言模型(LLMs)与专家校准和迭代提示优化技术来自动化分类法对齐。我们的方法整合了专家标注的例子、多阶段提示工程以及人类验证,以引导LLMs生成分类链接及支持性理由。在评估我们在特定领域概念重要性的映射任务上时,我们达到了F1评分为0.97的高分,远超出了人为基准的0.68分。这些结果证明了我们的方法能够有效地扩展分类法对齐,并保持高质量的映射同时为模糊案例保留专家监督。
https://arxiv.org/abs/2506.08422
This paper addresses the scarcity of low-cost but high-dexterity platforms for collecting real-world multi-fingered robot manipulation data towards generalist robot autonomy. To achieve it, we propose the RAPID Hand, a co-optimized hardware and software platform where the compact 20-DoF hand, robust whole-hand perception, and high-DoF teleoperation interface are jointly designed. Specifically, RAPID Hand adopts a compact and practical hand ontology and a hardware-level perception framework that stably integrates wrist-mounted vision, fingertip tactile sensing, and proprioception with sub-7 ms latency and spatial alignment. Collecting high-quality demonstrations on high-DoF hands is challenging, as existing teleoperation methods struggle with precision and stability on complex multi-fingered systems. We address this by co-optimizing hand design, perception integration, and teleoperation interface through a universal actuation scheme, custom perception electronics, and two retargeting constraints. We evaluate the platform's hardware, perception, and teleoperation interface. Training a diffusion policy on collected data shows superior performance over prior works, validating the system's capability for reliable, high-quality data collection. The platform is constructed from low-cost and off-the-shelf components and will be made public to ensure reproducibility and ease of adoption.
本文探讨了针对通用机器人自主性的现实世界多指机器人操作数据收集过程中,低成本高性能平台的稀缺问题。为了解决这一问题,我们提出了RAPID Hand,这是一种硬件和软件协同优化的平台,其中包括紧凑型20自由度的手部、强大的全手感知以及高自由度遥操作界面。 具体来说,RAPID Hand采用了紧湊且实用的手形设计,并通过一个硬件级别的感知框架稳定地集成了腕部安装的视觉、指尖触觉感应和本体感受,整个过程中的延迟不超过7毫秒,空间对齐也非常精准。在高自由度手部上收集高质量演示是一项挑战,因为现有的遥操作方法在复杂的多指系统中难以实现精确性和稳定性。 为了解决这个问题,我们通过通用驱动方案、定制的感知电子设备以及两个重定位约束来协同优化手的设计、感知集成和遥操作界面。我们对平台的硬件、感知能力和遥操作接口进行了评估。基于收集的数据训练出扩散策略显示出优于以往工作的性能,这验证了该系统在可靠高质量数据采集方面的能力。 该平台由低成本且现成的组件构建,并将被公开以确保可重复性和易于采用。
https://arxiv.org/abs/2506.07490
There are many types of standards in the field of communication. The traditional consulting model has a long cycle and relies on the knowledge and experience of experts, making it difficult to meet the rapidly developing technological demands. This paper combines the fine-tuning of large language models with the construction of knowledge graphs to implement an intelligent consultation and question-answering system for communication standards. The experimental results show that after LoRA tuning on the constructed dataset of 6,587 questions and answers in the field of communication standards, Qwen2.5-7B-Instruct demonstrates outstanding professional capabilities in the field of communication standards on the test set. BLEU-4 rose from 18.8564 to 66.8993, and evaluation indicators such as ROUGE also increased significantly, outperforming the fine-tuning effect of the comparison model Llama-3-8B-Instruct. Based on the ontology framework containing 6 entity attributes and 10 relation attributes, a knowledge graph of the communication standard domain containing 13,906 entities and 13,524 relations was constructed, showing a relatively good query accuracy rate. The intelligent consultation and question-answering system enables the fine-tuned model on the server side to access the locally constructed knowledge graph and conduct graphical retrieval of key information first, which is conducive to improving the question-answering effect. The evaluation using DeepSeek as the Judge on the test set shows that our RAG framework enables the fine-tuned model to improve the scores at all five angles, with an average score increase of 2.26%. And combined with web services and API interfaces, it has achieved very good results in terms of interaction experience and back-end access, and has very good practical application value.
通信领域有许多类型的标准。传统的咨询模式周期长,依赖于专家的知识和经验,难以满足快速发展的技术需求。本文结合大语言模型的微调与知识图谱构建,实现了针对通信标准的智能咨询问答系统。实验结果表明,在由6,587个问题及答案组成的通信标准领域数据集上进行LoRA微调后,Qwen2.5-7B-Instruct在测试集中展现了卓越的专业能力,BLEU-4得分从18.8564上升至66.8993,ROUGE等评估指标也显著提升,优于对比模型Llama-3-8B-Instruct的微调效果。基于包含6个实体属性和10个关系属性的知识图谱本体框架,构建了一个含有13,906个实体及13,524种关系的通信标准领域知识图谱,显示出相对较高的查询准确率。智能咨询问答系统使服务器端微调模型能够访问本地构建的知识图谱并进行图形检索的关键信息提取,有助于提升问答效果。使用DeepSeek作为评估工具在测试集上的评价结果显示,我们的RAG框架使得微调模型在五个维度的评分都有所提高,平均分提高了2.26%。结合网络服务和API接口,在交互体验及后台访问方面取得了非常好的结果,并具有很好的实际应用价值。
https://arxiv.org/abs/2506.07037
Educational, learning, and training materials have become extremely commonplace across the Internet. Yet, they frequently remain disconnected from each other, fall into platform silos, and so on. One way to overcome this is to provide a mechanism to integrate the material and provide cross-links across topics. In this paper, we present the Curriculum KG Ontology, which we use as a framework for the dense interlinking of educational materials, by first starting with organizational and broad pedagogical principles. We provide a materialized graph for the Prototype Open Knowledge Network use-case, and validate it using competency questions sourced from domain experts and educators.
教育、学习和培训材料在互联网上已经变得非常普遍。然而,这些材料往往彼此孤立,被限制在各自的平台中。为了克服这一问题,提供一种机制来整合这些材料并建立跨主题链接是一种方法。在这篇论文中,我们介绍了课程知识图谱本体(Curriculum KG Ontology),该本体作为密集互联教育资料的框架,并从组织原则和广泛的教学原理开始构建。我们为原型开放知识网络用例提供了实体化图表,并通过来自领域专家和教育工作者的问题来验证其有效性。
https://arxiv.org/abs/2506.05751
Zero-shot Event Detection (ED), the task of identifying event mentions in natural language text without any training data, is critical for document understanding in specialized domains. Understanding the complex event ontology, extracting domain-specific triggers from the passage, and structuring them appropriately overloads and limits the utility of Large Language Models (LLMs) for zero-shot ED. To this end, we propose DiCoRe, a divergent-convergent reasoning framework that decouples the task of ED using Dreamer and Grounder. Dreamer encourages divergent reasoning through open-ended event discovery, which helps to boost event coverage. Conversely, Grounder introduces convergent reasoning to align the free-form predictions with the task-specific instructions using finite-state machine guided constrained decoding. Additionally, an LLM-Judge verifies the final outputs to ensure high precision. Through extensive experiments on six datasets across five domains and nine LLMs, we demonstrate how DiCoRe consistently outperforms prior zero-shot, transfer-learning, and reasoning baselines, achieving 4-7% average F1 gains over the best baseline -- establishing DiCoRe as a strong zero-shot ED framework.
零样本事件检测(ED)的任务是在没有任何训练数据的情况下,在自然语言文本中识别事件提及,这对特定领域的文档理解至关重要。然而,理解和构建复杂的事件本体论、从段落中提取领域特异性的触发词,并适当结构化这些信息,给大型语言模型(LLMs)在零样本ED任务上的应用带来了巨大挑战并限制了其效用。为此,我们提出了DiCoRe框架,这是一种发散-收敛推理框架,通过Dreamer和Grounder将事件检测的任务分离开来。 Dreamer鼓励进行开放式的、发散性的思考以发现新事件,从而提高事件的覆盖率。相反,Grounder则引入了收敛性思维,使用有限状态机指导下的约束解码方法来使自由形式的预测与任务特定指令相匹配。此外,还通过一个LLM-Judge来验证最终输出结果,确保高精度。 在跨越五个领域和九个大型语言模型的大规模实验中,DiCoRe的表现始终优于之前的零样本、迁移学习和推理基准,在六个数据集上平均F1值提高了4-7%,确立了DiCoRe作为强有力的零样本事件检测框架的地位。
https://arxiv.org/abs/2506.05128
Medical artificial intelligence (AI) systems frequently lack systematic domain expertise integration, potentially compromising diagnostic reliability. This study presents an ontology-based framework for bone disease diagnosis, developed in collaboration with Ho Chi Minh City Hospital for Traumatology and Orthopedics. The framework introduces three theoretical contributions: (1) a hierarchical neural network architecture guided by bone disease ontology for segmentation-classification tasks, incorporating Visual Language Models (VLMs) through prompts, (2) an ontology-enhanced Visual Question Answering (VQA) system for clinical reasoning, and (3) a multimodal deep learning model that integrates imaging, clinical, and laboratory data through ontological relationships. The methodology maintains clinical interpretability through systematic knowledge digitization, standardized medical terminology mapping, and modular architecture design. The framework demonstrates potential for extension beyond bone diseases through its standardized structure and reusable components. While theoretical foundations are established, experimental validation remains pending due to current dataset and computational resource limitations. Future work will focus on expanding the clinical dataset and conducting comprehensive system validation.
医学人工智能(AI)系统经常缺乏系统的专业知识整合,这可能会影响诊断的可靠性。本研究提出了一种基于本体论的骨病诊断框架,并与胡志明市创伤和骨科医院合作开发了该框架。此框架带来了三个理论贡献: 1. 一种分层神经网络架构,由骨疾病本体指导,用于分割-分类任务,通过提示将视觉语言模型(VLMs)集成进来。 2. 增强型的基于本体论的视觉问答(VQA)系统,用于临床推理。 3. 一个多模态深度学习模型,通过本体关系整合影像学、临床和实验室数据。 该方法保持了临床解释性,通过系统的知识数字化、标准化医学术语映射以及模块化架构设计来实现。此框架展示了其潜在的扩展能力,可超越骨病诊断范围,这得益于其标准化结构和可重用组件。尽管理论基础已建立,但由于当前的数据集和计算资源限制,实验验证尚待进行。未来的工作将集中在扩大临床数据集并进行全面系统验证上。
https://arxiv.org/abs/2506.04756
Schemas are vital for ensuring data quality in the Semantic Web and natural language processing. Traditionally, their creation demands substantial involvement from knowledge engineers and domain experts. Leveraging the impressive capabilities of large language models (LLMs) in related tasks like ontology engineering, we explore automatic schema generation using LLMs. To bridge the resource gap, we introduce two datasets: YAGO Schema and Wikidata EntitySchema, along with evaluation metrics. The LLM-based pipelines effectively utilize local and global information from knowledge graphs (KGs) to generate validating schemas in Shape Expressions (ShEx). Experiments demonstrate LLMs' strong potential in producing high-quality ShEx schemas, paving the way for scalable, automated schema generation for large KGs. Furthermore, our benchmark introduces a new challenge for structured generation, pushing the limits of LLMs on syntactically rich formalisms.
模式对于确保语义网和自然语言处理中的数据质量至关重要。传统上,创建这些模式需要知识工程师和领域专家的大量参与。利用大型语言模型(LLMs)在本体工程等相关任务中的卓越能力,我们探索了使用LLM进行自动模式生成的方法。为了弥补资源缺口,我们引入了两个数据集:YAGO Schema 和 Wikidata EntitySchema,并提出了评估指标。基于LLM的管道能够有效利用知识图谱(KGs)中的本地和全局信息来生成 Shape Expressions (ShEx) 格式的验证模式。实验表明,LLMs在生成高质量 ShEx 模式方面具有强大的潜力,为大规模 KG 的可扩展、自动化模式生成铺平了道路。此外,我们的基准测试引入了一个新的结构化生成挑战,旨在考验 LLM 在语法丰富的形式化表示法上的能力极限。
https://arxiv.org/abs/2506.04512
Person Re-Identification (Re-ID) is a very important task in video surveillance systems such as tracking people, finding people in public places, or analysing customer behavior in supermarkets. Although there have been many works to solve this problem, there are still remaining challenges such as large-scale datasets, imbalanced data, viewpoint, fine grained data (attributes), the Local Features are not employed at semantic level in online stage of Re-ID task, furthermore, the imbalanced data problem of attributes are not taken into consideration. This paper has proposed a Unified Re-ID system consisted of three main modules such as Pedestrian Attribute Ontology (PAO), Local Multi-task DCNN (Local MDCNN), Imbalance Data Solver (IDS). The new main point of our Re-ID system is the power of mutual support of PAO, Local MDCNN and IDS to exploit the inner-group correlations of attributes and pre-filter the mismatch candidates from Gallery set based on semantic information as Fashion Attributes and Facial Attributes, to solve the imbalanced data of attributes without adjusting network architecture and data augmentation. We experimented on the well-known Market1501 dataset. The experimental results have shown the effectiveness of our Re-ID system and it could achieve the higher performance on Market1501 dataset in comparison to some state-of-the-art Re-ID methods.
人重新识别(Re-ID)是视频监控系统中的一个非常重要的任务,例如跟踪人员、在公共场所寻找人员或分析超市中顾客的行为。尽管已经有许多研究致力于解决这一问题,但仍存在一些挑战,如大规模数据集的处理、数据不平衡、视角变化以及细粒度数据(属性)等问题。此外,在人重新识别任务的在线阶段,局部特征并未被用于语义层面的应用,同时,属性的数据不平衡问题也未得到充分考虑。 本文提出了一种统一的人重新识别系统,该系统由三个主要模块组成:行人属性本体论(PAO)、局部多任务深度卷积神经网络(Local MDCNN)和数据不平衡求解器(IDS)。我们的人重新识别系统的创新点在于,这三个模块——即PAO、Local MDCNN 和 IDS 能够相互支持,通过利用属性之间的内部分组相关性,并基于语义信息如时尚属性和面部属性对Gallery集合中的不匹配候选者进行预筛选,以解决属性数据不平衡问题而不调整网络架构或数据增强。 我们在著名的Market1501数据集上进行了实验。实验结果表明了我们的人重新识别系统的效果显著,在Market1501数据集上的性能优于一些最新的Re-ID方法。
https://arxiv.org/abs/2506.04143
This study presents an approach that uses large language models such as GPT-4 to generate usage policies in the W3C Open Digital Rights Language ODRL automatically from natural language instructions. Our approach uses the ODRL ontology and its documentation as a central part of the prompt. Our research hypothesis is that a curated version of existing ontology documentation will better guide policy generation. We present various heuristics for adapting the ODRL ontology and its documentation to guide an end-to-end KG construction process. We evaluate our approach in the context of dataspaces, i.e., distributed infrastructures for trustworthy data exchange between multiple participating organizations for the cultural domain. We created a benchmark consisting of 12 use cases of varying complexity. Our evaluation shows excellent results with up to 91.95% accuracy in the resulting knowledge graph.
这项研究提出了一种方法,即利用大型语言模型(如GPT-4)根据自然语言指令自动生成W3C开放数字权利语言(ODRL)的使用政策。我们的方法将ODRL本体及其文档作为提示的核心部分。我们假设现有本体文档经过精炼后能更好地指导政策生成。我们提出了各种启发式方法,用于调整ODRL本体和其文档以引导整个知识图谱构建过程。 我们在数据空间(即文化领域中多个参与组织之间可信数据交换的分布式基础设施)的背景下评估了我们的方法。我们创建了一个由12个不同复杂度用例组成的基准测试集。评估结果显示,在生成的知识图谱中的准确性最高可达91.95%。
https://arxiv.org/abs/2506.03301
Neuroscience research publications encompass a vast wealth of knowledge. Accurately retrieving existing information and discovering new insights from this extensive literature is essential for advancing the field. However, when knowledge is dispersed across multiple sources, current state-of-the-art retrieval methods often struggle to extract the necessary information. A knowledge graph (KG) can integrate and link knowledge from multiple sources, but existing methods for constructing KGs in neuroscience often rely on labeled data and require domain expertise. Acquiring large-scale, labeled data for a specialized area like neuroscience presents significant challenges. This work proposes novel methods for constructing KG from unlabeled large-scale neuroscience research corpus utilizing large language models (LLM), neuroscience ontology, and text embeddings. We analyze the semantic relevance of neuroscience text segments identified by LLM for building the knowledge graph. We also introduce an entity-augmented information retrieval algorithm to extract knowledge from the KG. Several experiments were conducted to evaluate the proposed approaches, and the results demonstrate that our methods significantly enhance knowledge discovery from the unlabeled neuroscience research corpus. It achieves an F1 score of 0.84 for entity extraction, and the knowledge obtained from the KG improves answers to over 54% of the questions.
神经科学研究文献包含了丰富的知识财富。准确地检索现有信息并从这些广泛的文献中发现新的见解对于推动该领域的发展至关重要。然而,当知识分散在多个来源时,现有的最先进的检索方法往往难以提取所需的信息。知识图谱(KG)可以整合和链接来自多个来源的知识,但目前构建神经科学KG的方法通常依赖于标注数据,并需要领域专业知识。获取大规模、标注的特定领域的数据(如神经科学)面临重大挑战。 本文提出了一种利用大型语言模型(LLM)、神经科学本体论以及文本嵌入从未标注的大规模神经科学研究语料库中构造知识图谱的新方法。我们分析了LLM识别出的神经科学文本片段在构建知识图中的语义相关性,并引入了一个实体增强的信息检索算法,用于从KG中提取知识。 为了评估所提出的方法,进行了几项实验,结果表明我们的方法显著提高了从未标注的神经科学研究文献中发现新知识的能力。具体来说,在实体抽取任务上实现了0.84的F1分数;并且,通过使用KG获得的知识改善了超过54%问题的答案质量。
https://arxiv.org/abs/2506.03145
Transforming relational databases into knowledge graphs with enriched ontologies enhances semantic interoperability and unlocks advanced graph-based learning and reasoning over data. However, previous approaches either demand significant manual effort to derive an ontology from a database schema or produce only a basic ontology. We present RIGOR, Retrieval-augmented Iterative Generation of RDB Ontologies, an LLM-driven approach that turns relational schemas into rich OWL ontologies with minimal human effort. RIGOR combines three sources via RAG, the database schema and its documentation, a repository of domain ontologies, and a growing core ontology, to prompt a generative LLM for producing successive, provenance-tagged delta ontology fragments. Each fragment is refined by a judge-LLM before being merged into the core ontology, and the process iterates table-by-table following foreign key constraints until coverage is complete. Applied to real-world databases, our approach outputs ontologies that score highly on standard quality dimensions such as accuracy, completeness, conciseness, adaptability, clarity, and consistency, while substantially reducing manual effort.
将关系型数据库转换为具有丰富本体论的知识图可以增强语义互操作性,并解锁基于数据的高级图形学习和推理。然而,先前的方法要么需要大量的人工工作来从数据库模式中推导出一个本体论,要么只能生成一个基本的本体论。我们提出了RIGOR(通过检索增强迭代生成关系型数据库本体论),这是一种由大型语言模型驱动的方法,可以将关系模式转换为丰富的OWL本体论,并且只需要最小的人工努力。RIGOR结合了三个来源——数据库模式及其文档、领域本体库以及不断增长的核心本体——并通过检索增强生成(Retrieval-Augmented Generation, RAG)技术,提示一个生成式大型语言模型来产生带有出处标签的增量本体片段。每个片段在被合并到核心本体之前都会由一个评判型大型语言模型进行精炼,并且该过程会根据外键约束逐表迭代执行直至覆盖完成。 应用于实际数据库中时,我们的方法输出具有高标准化质量维度评分(如准确性、完整性、简洁性、适应性、清晰度和一致性)的本体论,同时大大减少了人工努力。
https://arxiv.org/abs/2506.01232
We propose DeepRAG, a novel framework that integrates DeepSeek hierarchical question decomposition capabilities with RAG Gym unified retrieval-augmented generation optimization using process level supervision. Targeting the challenging MedHopQA biomedical question answering task, DeepRAG systematically decomposes complex queries into precise sub-queries and employs concept level reward signals informed by the UMLS ontology to enhance biomedical accuracy. Preliminary evaluations on the MedHopQA dataset indicate that DeepRAG significantly outperforms baseline models, including standalone DeepSeek and RAG Gym, achieving notable improvements in both Exact Match and concept level accuracy.
我们提出了DeepRAG,这是一种新型框架,它将DeepSeek层级问题分解能力与RAG Gym统一的检索增强生成优化结合在一起,并使用过程级别的监督进行训练。针对具有挑战性的MedHopQA生物医学问答任务,DeepRAG系统地将复杂的查询分解为精确的子查询,并采用由UMLS本体论信息引导的概念级别奖励信号来提高生物医学准确性。在MedHopQA数据集上的初步评估表明,与单独使用DeepSeek和RAG Gym等基线模型相比,DeepRAG在精确匹配和概念级别准确度方面取得了显著改进。
https://arxiv.org/abs/2506.00671
Ontologies are pivotal for structuring knowledge bases to enhance question answering (QA) systems powered by Large Language Models (LLMs). However, traditional ontology creation relies on manual efforts by domain experts, a process that is time intensive, error prone, and impractical for large, dynamic knowledge domains. This paper introduces OntoRAG, an automated pipeline designed to derive ontologies from unstructured knowledge bases, with a focus on electrical relay documents. OntoRAG integrates advanced techniques, including web scraping, PDF parsing, hybrid chunking, information extraction, knowledge graph construction, and ontology creation, to transform unstructured data into a queryable ontology. By leveraging LLMs and graph based methods, OntoRAG enhances global sensemaking capabilities, outperforming conventional Retrieval Augmented Generation (RAG) and GraphRAG approaches in comprehensiveness and diversity. Experimental results demonstrate OntoRAGs effectiveness, achieving a comprehensiveness win rate of 85% against vector RAG and 75% against GraphRAGs best configuration. This work addresses the critical challenge of automating ontology creation, advancing the vision of the semantic web.
本论文介绍了Ontologies(本体论)在结构化知识库以增强大型语言模型(LLM)驱动的问题回答系统方面的重要性。然而,传统的本体创建依赖于领域专家的大量手动工作,这一过程耗时、易出错,并且对于大规模和动态变化的知识域来说难以实现。 为了解决这个问题,本文介绍了一种名为OntoRAG的自动化流程,该流程旨在从无结构的知识库中推导出本体论,特别是针对电气继电器文档。OntoRAG整合了先进的技术手段,包括网络抓取、PDF解析、混合分块、信息抽取、知识图谱构建以及本体创建等方法,将非结构化数据转化为可查询的本体。 通过利用大型语言模型和基于图的方法,OntoRAG增强了全球性的理解能力,在全面性和多样性方面超越了传统的检索增强生成(RAG)和GraphRAG方法。实验结果表明,OntoRAG在与向量RAG的竞争中赢得了85%的全面性胜率,并且在面对GraphRAG的最佳配置时也保持75%的优势。 这项工作解决了自动创建本体论的关键挑战,为语义网络的发展推进了愿景。
https://arxiv.org/abs/2506.00664
The availability of Large Language Models (LLMs) presents a unique opportunity to reinvigorate research on Knowledge Engineering (KE) automation, a trend already evident in recent efforts developing LLM-based methods and tools for the automatic generation of Competency Questions (CQs). However, the evaluation of these tools lacks standardisation. This undermines the methodological rigour and hinders the replication and comparison of results. To address this gap, we introduce Bench4KE, an extensible API-based benchmarking system for KE automation. Its first release focuses on evaluating tools that generate CQs automatically. CQs are natural language questions used by ontology engineers to define the functional requirements of an ontology. Bench4KE provides a curated gold standard consisting of CQ datasets from four real-world ontology projects. It uses a suite of similarity metrics to assess the quality of the CQs generated. We present a comparative analysis of four recent CQ generation systems, which are based on LLMs, establishing a baseline for future research. Bench4KE is also designed to accommodate additional KE automation tasks, such as SPARQL query generation, ontology testing and drafting. Code and datasets are publicly available under the Apache 2.0 license.
大型语言模型(LLMs)的可用性为重启知识工程(Knowledge Engineering,KE)自动化研究提供了独特的机会,这一趋势在最近开发基于LLM的方法和工具以自动生成胜任问题(Competency Questions,CQs)的努力中已经显现。然而,这些工具的评估缺乏标准化,这削弱了方法论严谨性,并阻碍了结果的重复与比较。为了解决这一缺口,我们推出了Bench4KE,这是一个可扩展的基于API的基准测试系统,用于KE自动化。其首次发布集中于评价能够自动生成CQs的工具。CQs是由本体工程师用来定义本体功能性需求的自然语言问题。 Bench4KE提供了一个精心策划的黄金标准集,其中包括四个实际项目中的CQ数据集。它使用一系列相似性度量来评估生成的CQ质量。我们对四种最近的基于LLM的CQ生成系统进行了比较分析,确立了未来研究的基础。此外,Bench4KE的设计还考虑到了其他KE自动化任务的支持,如SPARQL查询生成、本体测试和草拟工作。 代码与数据集可以在Apache 2.0许可下公开获取。
https://arxiv.org/abs/2505.24554
Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines RAG-VLM and Policy-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3D, MP3D, and GOAT datasets, where it achieves state-of-the-art performance on both success rate (SR) and success weighted by path length (SPL) metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot autonomous navigation without requiring prior map building or pre-training.
在不熟悉的环境中进行自适应导航对于家用服务机器人来说至关重要,但仍然面临挑战。由于需要同时处理低层次的路径规划和高层次的场景理解,这一任务变得复杂化。尽管基于视觉-语言模型(VLM)的零样本方法可以减少对先验地图和特定场景训练数据的依赖,但是它们依然存在显著的局限性:空间时间上的离散观察导致的时空不连续性、无结构的记忆表示以及对于任务理解不足所导致的导航失败。 我们提出了DORAEMON(Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation),这是一个新颖的认知启发框架,包括腹侧流和背侧流两部分。这个框架模仿了人类的导航能力。在DORAEMON中,背侧流实现了层次化的语义-空间融合和拓扑地图来处理时空不连续性问题;而腹侧流则结合了检索增强型视觉语言模型(RAG-VLM)和策略型视觉语言模型(Policy-VLM),以提升决策制定的质量。此外,我们的方法还开发了一种名为Nav-Ensurance的技术,用以保证导航的安全性和效率。 我们在HM3D、MP3D以及GOAT数据集上对DORAEMON进行了评估,在成功率(SR)和路径长度加权的成功率(SPL)等指标上均达到了最先进的性能水平,明显优于现有的方法。此外,我们还引入了一种新的评价指标(AORI),以更好地评估导航智能。 综合实验结果表明,DORAEMON在无需预先构建地图或进行预训练的情况下,在零样本自主导航任务中展现了显著的效果和优势。
https://arxiv.org/abs/2505.21969
As AI systems increasingly navigate applications in healthcare, law, and governance, understanding how they handle ethically complex scenarios becomes critical. Previous work has mainly examined the moral judgments in large language models (LLMs), rather than their underlying moral reasoning process. In contrast, we focus on a large-scale analysis of the moral reasoning traces provided by LLMs. Furthermore, unlike prior work that attempted to draw inferences from only a handful of moral dilemmas, our study leverages over 600 distinct trolley problems as probes for revealing the reasoning patterns that emerge within different LLMs. We introduce and test a taxonomy of moral rationales to systematically classify reasoning traces according to two main normative ethical theories: consequentialism and deontology. Our analysis reveals that LLM chains-of-thought tend to favor deontological principles based on moral obligations, while post-hoc explanations shift notably toward consequentialist rationales that emphasize utility. Our framework provides a foundation for understanding how LLMs process and articulate ethical considerations, an important step toward safe and interpretable deployment of LLMs in high-stakes decision-making environments. Our code is available at this https URL .
随着人工智能系统越来越多地应用于医疗、法律和治理领域,了解它们如何处理伦理上复杂的场景变得至关重要。以往的研究主要集中在大型语言模型(LLMs)的道德判断上,而不是其背后的道德推理过程。相比之下,我们专注于对LLM提供的大规模道德推理痕迹进行分析。此外,与之前仅从少数道德困境中推断的工作不同,我们的研究利用超过600个不同的电车问题作为探测器,以揭示在不同LLM中出现的推理模式。我们引入并测试了一种基于两大主要规范伦理理论——功利主义和义务论——对道德理由进行系统分类的框架。我们的分析显示,LLM的推理过程倾向于支持基于道德义务的义务论原则,而在事后解释时则明显转向强调效用的功利论理由。我们的框架为理解LLM如何处理并表达伦理考虑提供了一个基础,这是在高风险决策环境中安全和可解释地部署LLM的重要一步。我们的代码可在[此处](https://this-URL.com)获取。
https://arxiv.org/abs/2505.21479
Knowledge Graphs (KGs) are increasingly adopted as a foundational technology for integrating heterogeneous data in domains such as climate science, cultural heritage, and the life sciences. Declarative mapping languages like R2RML and RML have played a central role in enabling scalable and reusable KG construction, offering a transparent means of transforming structured and semi-structured data into RDF. In this paper, we present PyRML, a lightweight, Python-native library for building Knowledge Graphs through declarative mappings. PyRML supports core RML constructs and provides a programmable interface for authoring, executing, and testing mappings directly within Python environments. It integrates with popular data and semantic web libraries (e.g., Pandas and RDFlib), enabling transparent and modular workflows. By lowering the barrier to entry for KG creation and fostering reproducible, ontology-aligned data integration, PyRML bridges the gap between declarative semantics and practical KG engineering.
知识图谱(KGs)在气候变化科学、文化遗产和生命科学等领域中越来越被采纳为整合异构数据的基础技术。声明式映射语言,如 R2RML 和 RML,在实现可扩展且可重用的知识图构建方面发挥了核心作用,提供了一种将结构化和半结构化数据转换为 RDF 的透明方式。在本文中,我们介绍了 PyRML,这是一个轻量级、专为 Python 设计的库,用于通过声明式映射来构建知识图谱。PyRML 支持 RML 核心构造,并提供了一种编程接口,可直接在 Python 环境中编写、执行和测试映射。它与流行的数据及语义网络库(例如 Pandas 和 RDFlib)集成,支持透明且模块化的流程。通过降低构建知识图谱的门槛并促进符合本体论的数据整合,PyRML 桥接了声明性语义和实用的知识图工程之间的差距。
https://arxiv.org/abs/2505.20949
Knowledge graphs offer an excellent solution for representing the lexical-semantic structures of lexicographic data. However, working with the SPARQL query language represents a considerable hurdle for many non-expert users who could benefit from the advantages of this technology. This paper addresses the challenge of creating natural language interfaces for lexicographic data retrieval on knowledge graphs such as Wikidata. We develop a multidimensional taxonomy capturing the complexity of Wikidata's lexicographic data ontology module through four dimensions and create a template-based dataset with over 1.2 million mappings from natural language utterances to SPARQL queries. Our experiments with GPT-2 (124M), Phi-1.5 (1.3B), and GPT-3.5-Turbo reveal significant differences in model capabilities. While all models perform well on familiar patterns, only GPT-3.5-Turbo demonstrates meaningful generalization capabilities, suggesting that model size and diverse pre-training are crucial for adaptability in this domain. However, significant challenges remain in achieving robust generalization, handling diverse linguistic data, and developing scalable solutions that can accommodate the full complexity of lexicographic knowledge representation.
知识图谱为表示词典数据的词汇语义结构提供了一个极佳的解决方案。然而,对于许多非专业用户来说,使用SPARQL查询语言构成了一大障碍,尽管这些用户可以从这项技术中受益良多。本文旨在解决在如Wikidata等知识图谱上创建词典数据检索自然语言接口所面临的挑战。我们开发了一个多维分类系统,通过四个维度捕捉了Wikidata词典数据本体模块的复杂性,并构建了一个基于模板的数据集,其中包含了超过120万条从自然语言表达到SPARQL查询的映射。 我们的实验使用了GPT-2(124M参数)、Phi-1.5(13亿参数)和GPT-3.5-Turbo模型,并发现这些模型在能力上存在显著差异。所有模型在熟悉模式上的表现都很出色,但只有GPT-3.5-Turbo展示了有意义的泛化能力,这表明在这个领域中,模型大小和多样化的预训练对于适应性至关重要。然而,在实现稳健的泛化、处理多样的语言数据以及开发能够应对词典知识表示完整复杂性的可扩展解决方案方面仍然存在重大挑战。
https://arxiv.org/abs/2505.19971