The rapid acceleration of scientific publishing has created substantial challenges for researchers attempting to discover, contextualize, and interpret relevant literature. Traditional keyword-based search systems provide limited semantic understanding, while existing AI-driven tools typically focus on isolated tasks such as retrieval, clustering, or bibliometric visualization. This paper presents an integrated system for scientific literature exploration that combines large-scale data acquisition, hybrid retrieval, semantic topic modeling, and heterogeneous knowledge graph construction. The system builds a comprehensive corpus by merging full-text data from arXiv with structured metadata from OpenAlex. A hybrid retrieval architecture fuses BM25 lexical search with embedding-based semantic search using Reciprocal Rank Fusion. Topic modeling is performed on retrieved results using BERTopic or non-negative matrix factorization depending on computational resources. A knowledge graph unifies papers, authors, institutions, countries, and extracted topics into an interpretable structure. The system provides a multi-layered exploration environment that reveals not only relevant publications but also the conceptual and relational landscape surrounding a query. Evaluation across multiple queries demonstrates improvements in retrieval relevance, topic coherence, and interpretability. The proposed framework contributes an extensible foundation for AI-assisted scientific discovery.
科学研究出版的快速加速为研究人员在发现、理解及解释相关文献方面带来了重大挑战。传统的基于关键词的搜索系统提供的语义理解有限,而现有的人工智能驱动工具通常专注于孤立的任务如检索、聚类或引文分析可视化。本文提出了一种集成了大规模数据获取、混合检索、语义主题建模和异构知识图构建的综合系统,用于科学文献探索。 该系统通过合并arXiv中的全文数据与OpenAlex中的结构化元数据来构建一个全面的知识库。采用一种融合了BM25词汇搜索与基于嵌入式语义搜索(使用互惠排名融合)的混合检索架构。根据计算资源的不同,从检索结果中分别应用BERTopic或非负矩阵分解进行主题建模。通过知识图将论文、作者、机构、国家和提取的主题统一为一个可解释的结构。 该系统提供了一个多层次探索环境,不仅揭示与查询相关的出版物,还展示了围绕查询的概念及其关联性景观。经过多轮查询评估证明了在检索相关性、话题连贯性和可解释性方面的改进。所提出的框架为人工智能辅助科学发现提供了可扩展的基础架构。
https://arxiv.org/abs/2512.12760
Agentic memory is emerging as a key enabler for large language models (LLM) to maintain continuity, personalization, and long-term context in extended user interactions, critical capabilities for deploying LLMs as truly interactive and adaptive agents. Agentic memory refers to the memory that provides an LLM with agent-like persistence: the ability to retain and act upon information across conversations, similar to how a human would. We present Memoria, a modular memory framework that augments LLM-based conversational systems with persistent, interpretable, and context-rich memory. Memoria integrates two complementary components: dynamic session-level summarization and a weighted knowledge graph (KG)-based user modelling engine that incrementally captures user traits, preferences, and behavioral patterns as structured entities and relationships. This hybrid architecture enables both short-term dialogue coherence and long-term personalization while operating within the token constraints of modern LLMs. We demonstrate how Memoria enables scalable, personalized conversational artificial intelligence (AI) by bridging the gap between stateless LLM interfaces and agentic memory systems, offering a practical solution for industry applications requiring adaptive and evolving user experiences.
代理记忆正逐渐成为大型语言模型(LLM)维持连续性、个性化和长期上下文的关键因素,在扩展用户交互中,这一能力对于将LLM部署为真正互动且适应性强的智能体至关重要。代理记忆是指赋予LLM类似代理持久性的记忆:能够在对话间保留并利用信息的能力,就像人类一样。我们介绍了Memoria,一个模块化记忆框架,该框架通过持久、可解释和上下文丰富的记忆增强了基于LLM的对话系统。 Memoria整合了两个互补组件:动态会话级别的摘要生成器以及基于加权知识图谱(KG)的用户建模引擎,后者能够逐步捕捉用户的特质、偏好及行为模式,并将其表示为结构化的实体和关系。这种混合架构能够在现代LLM的令牌限制内同时实现短期对话连贯性和长期个性化。 我们展示了Memoria如何通过弥合无状态LLM接口与代理记忆系统之间的差距,从而支持可扩展且个性化的对话人工智能,提供了行业应用所需适应性及演进用户体验的实际解决方案。
https://arxiv.org/abs/2512.12686
Temporal Knowledge Graph Reasoning (TKGR) aims to complete missing factual elements along the timeline. Depending on the temporal position of the query, the task is categorized into interpolation and extrapolation. Existing interpolation methods typically embed temporal information into individual facts to complete missing historical knowledge, while extrapolation techniques often leverage sequence models over graph snapshots to identify recurring patterns for future event prediction. These methods face two critical challenges: limited contextual modeling in interpolation and cognitive generalization bias in extrapolation. To address these, we propose a unified method for TKGR, dubbed DynaGen. For interpolation, DynaGen dynamically constructs entity-centric subgraphs and processes them with a synergistic dual-branch GNN encoder to capture evolving structural context. For extrapolation, it applies a conditional diffusion process, which forces the model to learn underlying evolutionary principles rather than just superficial patterns, enhancing its ability to predict unseen future events. Extensive experiments on six benchmark datasets show DynaGen achieves state-of-the-art performance. On average, compared to the second-best models, DynaGen improves the Mean Reciprocal Rank (MRR) score by 2.61 points for interpolation and 1.45 points for extrapolation.
时间知识图谱推理(Temporal Knowledge Graph Reasoning,TKGR)旨在沿时间线填补缺失的事实元素。根据查询的时间位置,任务被分类为插值和外推。现有的插值方法通常将时间信息嵌入到单个事实中以完成缺失的历史知识的填补,而外推技术则往往利用图快照上的序列模型来识别重复出现的模式以进行未来事件预测。这些方法面临着两个关键挑战:在插值中的上下文建模能力有限和在外推中的认知泛化偏差。 为了解决这些问题,我们提出了一种名为DynaGen的统一方法来进行时间知识图谱推理(TKGR)。对于插值任务,DynaGen动态构建以实体为中心的子图,并通过协同双分支GNN编码器处理这些子图来捕捉演变的结构上下文。对于外推任务,它应用了一个条件扩散过程,该过程迫使模型学习底层演化原理而非仅表面模式,从而增强了其预测未来未见过事件的能力。 在六个基准数据集上的广泛实验表明,DynaGen达到了最先进的性能水平。平均而言,在插值任务中,与第二好的方法相比,DynaGen将平均倒数排名(Mean Reciprocal Rank, MRR)得分提高了2.61分;在外推任务中,这一改进为1.45分。
https://arxiv.org/abs/2512.12669
Sparse Knowledge Graphs (KGs) are commonly encountered in real-world applications, where knowledge is often incomplete or limited. Sparse KG reasoning, the task of inferring missing knowledge over sparse KGs, is inherently challenging due to the scarcity of knowledge and the difficulty of capturing relational patterns in sparse scenarios. Among all sparse KG reasoning methods, path-based ones have attracted plenty of attention due to their interpretability. Existing path-based methods typically rely on computationally intensive random walks to collect paths, producing paths of variable quality. Additionally, these methods fail to leverage the structured nature of graphs by treating paths independently. To address these shortcomings, we propose a Structural and Probabilistic framework named StruProKGR, tailored for efficient and interpretable reasoning on sparse KGs. StruProKGR utilizes a distance-guided path collection mechanism to significantly reduce computational costs while exploring more relevant paths. It further enhances the reasoning process by incorporating structural information through probabilistic path aggregation, which prioritizes paths that reinforce each other. Extensive experiments on five sparse KG reasoning benchmarks reveal that StruProKGR surpasses existing path-based methods in both effectiveness and efficiency, providing an effective, efficient, and interpretable solution for sparse KG reasoning.
稀疏知识图谱(KGs)在现实世界的应用中经常出现,其中知识往往不完整或有限。稀疏KG推理是推断稀疏KG中缺失的知识的任务,由于知识的稀缺性和在稀疏场景下捕捉关系模式的难度,这一任务本身具有挑战性。在所有稀疏KG推理方法中,基于路径的方法因其可解释性而吸引了大量关注。现有的基于路径的方法通常依赖于计算密集型的随机游走来收集路径,这会产生质量参差不齐的路径。此外,这些方法未能利用图的结构化特性,而是独立地处理路径。为了解决这些问题,我们提出了一种名为StruProKGR的结构性和概率性的框架,旨在对稀疏KG进行高效且可解释的推理。StruProKGR采用一种距离指导的路径收集机制来显著减少计算成本,并探索更多相关路径。此外,它通过基于概率的路径聚合进一步增强推理过程,优先考虑相互强化的路径。在五个稀疏KG推理基准上的广泛实验表明,StruProKGR不仅在有效性方面超越了现有的基于路径的方法,在效率上也是如此,从而为稀疏KG推理提供了一种有效、高效且可解释的解决方案。
https://arxiv.org/abs/2512.12613
Node importance estimation (NIE) in heterogeneous knowledge graphs is a critical yet challenging task, essential for applications such as recommendation, knowledge reasoning, and question answering. Existing methods often rely on pairwise connections, neglecting high-order dependencies among multiple entities and relations, and they treat structural and semantic signals independently, hindering effective cross-modal integration. To address these challenges, we propose MetaHGNIE, a meta-path induced hypergraph contrastive learning framework for disentangling and aligning structural and semantic information. MetaHGNIE constructs a higher-order knowledge graph via meta-path sequences, where typed hyperedges capture multi-entity relational contexts. Structural dependencies are aggregated with local attention, while semantic representations are encoded through a hypergraph transformer equipped with sparse chunking to reduce redundancy. Finally, a multimodal fusion module integrates structural and semantic embeddings under contrastive learning with auxiliary supervision, ensuring robust cross-modal alignment. Extensive experiments on benchmark NIE datasets demonstrate that MetaHGNIE consistently outperforms state-of-the-art baselines. These results highlight the effectiveness of explicitly modeling higher-order interactions and cross-modal alignment in heterogeneous knowledge graphs. Our code is available at this https URL
节点重要性估计(NIE)在异构知识图谱中的任务是一项关键但具有挑战性的任务,对于推荐、知识推理和问题回答等应用至关重要。现有方法通常依赖于成对连接,忽视了多实体与关系之间的高阶依赖,并且独立处理结构信号和语义信号,这阻碍了有效的跨模态融合。为了解决这些问题,我们提出了MetaHGNIE框架——一种基于元路径的超图对比学习框架,用于分解并对齐结构信息和语义信息。 MetaHGNIE通过元路径序列构建了一个高阶知识图谱,在其中类型化的超边捕捉多实体的关系上下文。利用局部注意力聚合结构依赖性,而语义表示则通过配备稀疏分块的超图变换器进行编码以减少冗余。最后,一个跨模态融合模块在辅助监督下的对比学习中整合了结构和语义嵌入,确保了稳健的跨模态对齐。 在基准NIE数据集上的广泛实验表明,MetaHGNIE始终优于最先进的基线模型。这些结果突显了在异构知识图谱中明确建模高阶交互和跨模态对齐的有效性。我们的代码可在上述链接获取。
https://arxiv.org/abs/2512.12477
Traditional ontology design emphasizes disjoint and exhaustive top-level distinctions such as continuant vs. occurrent, abstract vs. concrete, or type vs. instance. These distinctions are used to structure unified hierarchies where every entity is classified under a single upper-level category. Wikidata, by contrast, does not enforce a singular foundational taxonomy. Instead, it accommodates multiple classification axes simultaneously under the shared root class entity. This paper analyzes the structural implications of Wikidata's polyhierarchical and multi-axial design. The Wikidata architecture enables a scalable and modular approach to ontology construction, especially suited to collaborative and evolving knowledge graphs.
传统的本体设计强调在最高层做出不相交且详尽的区分,例如持续存在与发生存在、抽象与具体或类型与实例。这些区分被用来构建统一的层级结构,在这种结构中每个实体都被归类到单一的高层类别之下。相比之下,Wikidata 并不要求一个单一的基础分类体系。相反,它允许同时在共享根类“实体”下支持多个分类轴。本文分析了 Wikidata 的多层级和多轴设计所带来的结构性影响。Wikidata 的架构使得本体构建能够采取一种可扩展且模块化的途径,尤其适合于协作性和不断发展的知识图谱。
https://arxiv.org/abs/2512.12260
Knowledge Graphs (KGs), thanks to their concise and efficient triple-based structure, have been widely applied in intelligent question answering, recommender systems and other domains. However, the heterogeneous and multifaceted nature of real-world data inevitably renders the distribution of relations long-tailed, making it crucial to complete missing facts with limited samples. Previous studies mainly based on metric matching or meta learning, yet they either fail to fully exploit neighborhood information in graph or overlook the distributional characteristics of contrastive signals. In this paper, we re-examine the problem from a perspective of generative representation and propose a few-shot knowledge graph completion framework that integrates two-stage attention triple enhancer with U-KAN based diffusion model. Extensive experiments on two public datasets show that our method achieve new state-of-the-art results.
知识图谱(KGs)由于其简洁高效的三元组结构,在智能问答、推荐系统等多个领域得到了广泛应用。然而,现实世界数据的异质性和多面性不可避免地导致了关系分布呈现长尾特性,这就需要在样本有限的情况下完成缺失事实补全变得尤为重要。以往的研究主要基于度量匹配或元学习方法,但它们要么未能充分利用图中的邻居信息,要么忽视了对比信号的分布特征。 本文从生成式表示的角度重新审视这个问题,并提出了一种结合两阶段注意力三元组增强与U-KAN扩散模型的少量样本知识图谱补全框架。在两个公开数据集上的大量实验表明,我们的方法达到了新的最先进的结果。 翻译如下: Knowledge Graphs (KGs), owing to their concise and efficient triple-based structure, have been extensively applied in intelligent question answering, recommender systems, and other domains. However, the heterogeneous and multifaceted nature of real-world data inevitably leads to a long-tailed distribution of relations, making it crucial to complete missing facts with limited samples. Previous studies mainly rely on metric matching or meta-learning approaches; however, these methods either fail to fully exploit neighborhood information in graphs or overlook the distributional characteristics of contrastive signals. In this paper, we revisit the problem from the perspective of generative representation and propose a few-shot knowledge graph completion framework that integrates a two-stage attention triple enhancer with a U-KAN-based diffusion model. Extensive experiments on two public datasets demonstrate that our method achieves state-of-the-art results.
https://arxiv.org/abs/2512.12182
Indoor navigation remains a critical challenge for people with visual impairments. The current solutions mainly rely on infrastructure-based systems, which limit their ability to navigate safely in dynamic environments. We propose a novel navigation approach that utilizes a foundation model to transform floor plans into navigable knowledge graphs and generate human-readable navigation instructions. Floorplan2Guide integrates a large language model (LLM) to extract spatial information from architectural layouts, reducing the manual preprocessing required by earlier floorplan parsing methods. Experimental results indicate that few-shot learning improves navigation accuracy in comparison to zero-shot learning on simulated and real-world evaluations. Claude 3.7 Sonnet achieves the highest accuracy among the evaluated models, with 92.31%, 76.92%, and 61.54% on the short, medium, and long routes, respectively, under 5-shot prompting of the MP-1 floor plan. The success rate of graph-based spatial structure is 15.4% higher than that of direct visual reasoning among all models, which confirms that graphical representation and in-context learning enhance navigation performance and make our solution more precise for indoor navigation of Blind and Low Vision (BLV) users.
室内导航对于视力障碍者来说仍然是一个关键挑战。目前的解决方案主要依赖于基于基础设施的系统,这限制了他们在动态环境中安全导航的能力。我们提出了一种新的导航方法,利用基础模型将平面图转换为可导航的知识图,并生成人类可读的导航指令。Floorplan2Guide 集成了一个大型语言模型(LLM),以从建筑布局中提取空间信息,减少了早期楼层平面解析方法所需的大量预处理工作。 实验结果显示,在模拟和真实世界的评估下,少量样本学习比零样本学习提高了导航准确性。在对 MP-1 楼层平面图进行 5-shot 提示时,Claude 3.7 Sonnet 在所有评估的模型中表现最佳,分别在短、中和长路线上的准确率为 92.31%、76.92% 和 61.54%。基于图的空间结构的成功率比直接视觉推理高出所有模型的平均成功率 15.4%,这证实了图形表示法和上下文学习可以增强导航性能,使我们的解决方案对于盲人和低视力(BLV)用户更精确的室内导航成为可能。
https://arxiv.org/abs/2512.12177
The rapid proliferation of artificial intelligence (AI) models and methods presents growing challenges for research software engineers and researchers who must select, integrate, and maintain appropriate models within complex research workflows. Model selection is often performed in an ad hoc manner, relying on fragmented metadata and individual expertise, which can undermine reproducibility, transparency, and overall research software quality. This work proposes a structured and evidence-driven approach to support AI model selection that aligns with both technical and contextual requirements. We conceptualize AI model selection as a Multi-Criteria Decision-Making (MCDM) problem and introduce an evidence-based decision-support framework that integrates automated data collection pipelines, a structured knowledge graph, and MCDM principles. Following the Design Science Research methodology, the proposed framework (ModelSelect) is empirically validated through 50 real-world case studies and comparative experiments against leading generative AI systems. The evaluation results show that ModelSelect produces reliable, interpretable, and reproducible recommendations that closely align with expert reasoning. Across the case studies, the framework achieved high coverage and strong rationale alignment in both model and library recommendation tasks, performing comparably to generative AI assistants while offering superior traceability and consistency. By framing AI model selection as an MCDM problem, this work establishes a rigorous foundation for transparent and reproducible decision support in research software engineering. The proposed framework provides a scalable and explainable pathway for integrating empirical evidence into AI model recommendation processes, ultimately improving the quality and robustness of research software decision-making.
人工智能(AI)模型和方法的迅速发展给研究软件工程师和研究人员带来了越来越多的挑战,他们必须在复杂的科研工作流程中选择、集成并维护适当的模型。当前,模型的选择往往以临时的方式进行,依赖于碎片化的元数据和个人的专业知识,这可能削弱研究结果的可重复性、透明度以及整体研究软件的质量。本文提出了一种结构化且基于证据的方法来支持AI模型选择,这种方法与技术要求和具体背景需求相一致。我们将AI模型的选择视为一个多标准决策制定(MCDM)问题,并引入了一个结合了自动化数据收集管道、结构化的知识图谱及多准则决策理论的决策支持框架。 遵循设计科学研究方法论,我们通过50个真实案例研究以及与领先的生成式人工智能系统进行比较实验来实证检验所提出的模型选择框架(ModelSelect)。评估结果显示,ModelSelect能够产生可靠、可解释且具有高重复性的推荐结果,并且这些推荐与专家推理高度一致。在各个案例中,该框架在模型和库推荐任务上均达到了较高的覆盖率以及强大的理由一致性,在性能上可以比肩生成式AI助手,同时提供了更强的追溯能力和一致性。 通过将AI模型选择视为一个MCDM问题,这项工作为科研软件工程中的透明且可重复性决策支持建立了严格的理论基础。提出的框架提供了一条规模化和可解释性的路径,将实证证据整合到AI模型推荐流程中,最终提升研究软件决策的质量与稳健性。
https://arxiv.org/abs/2512.11984
As AI and web agents become pervasive in decision-making, it is critical to design intelligent systems that not only support sustainability efforts but also guard against misinformation. Greenwashing, i.e., misleading corporate sustainability claims, poses a major challenge to environmental progress. To address this challenge, we introduce EmeraldMind, a fact-centric framework integrating a domain-specific knowledge graph with retrieval-augmented generation to automate greenwashing detection. EmeraldMind builds the EmeraldGraph from diverse corporate ESG (environmental, social, and governance) reports, surfacing verifiable evidence, often missing in generic knowledge bases, and supporting large language models in claim assessment. The framework delivers justification-centric classifications, presenting transparent, evidence-backed verdicts and abstaining responsibly when claims cannot be verified. Experiments on a new greenwashing claims dataset demonstrate that EmeraldMind achieves competitive accuracy, greater coverage, and superior explanation quality compared to generic LLMs, without the need for fine-tuning or retraining.
随着人工智能和网络代理在决策中变得越来越普遍,设计支持可持续性努力并防范错误信息的智能系统变得至关重要。绿洗(即误导性的企业可持续声明)对环境进步构成了重大挑战。为了解决这一问题,我们提出了EmeraldMind框架,这是一个以事实为中心的框架,集成了特定领域的知识图谱和检索增强生成技术,用于自动检测绿洗行为。 EmeraldMind构建了名为EmeraldGraph的知识图谱,该图谱基于多样化的公司ESG(环境、社会和治理)报告。这些报告往往包含了通用知识库中所缺失的重要证据,并支持大规模语言模型进行声明评估。框架通过提供透明且有依据的裁决来实现以理由为中心的分类,在无法验证声明时负责任地拒绝做出判断。 在新的绿洗声明数据集上的实验表明,与需要微调或重新训练的一般LLM(大型语言模型)相比,EmeraldMind实现了具有竞争力的准确率、更大的覆盖范围和更优质的解释质量,而无需进行任何额外调整。
https://arxiv.org/abs/2512.11506
Cultivating higher-order cognitive abilities -- such as knowledge integration, critical thinking, and creativity -- in modern STEM education necessitates a pedagogical shift from passive knowledge transmission to active Socratic construction. Although Large Language Models (LLMs) hold promise for STEM Interdisciplinary education, current methodologies employing Prompt Engineering (PE), Supervised Fine-tuning (SFT), or standard Reinforcement Learning (RL) often fall short of supporting this paradigm. Existing methods are hindered by three fundamental challenges: the inability to dynamically model latent student cognitive states; severe reward sparsity and delay inherent in long-term educational goals; and a tendency toward policy collapse lacking strategic diversity due to reliance on behavioral cloning. Recognizing the unobservability and dynamic complexity of these interactions, we formalize the Socratic Interdisciplinary Instructional Problem (SIIP) as a structured Partially Observable Markov Decision Process (POMDP), demanding simultaneous global exploration and fine-grained policy refinement. To this end, we propose ERL4SIIP, a novel Evolutionary Reinforcement Learning (ERL) framework specifically tailored for this domain. ERL4SIIP integrates: (1) a dynamic student simulator grounded in a STEM knowledge graph for latent state modeling; (2) a Hierarchical Reward Mechanism that decomposes long-horizon goals into dense signals; and (3) a LoRA-Division based optimization strategy coupling evolutionary algorithms for population-level global search with PPO for local gradient ascent.
在现代STEM教育中培养高层次的认知能力,如知识整合、批判性思维和创造力,需要从被动的知识传递转向积极的苏格拉底式构建教学方法。尽管大型语言模型(LLMs)对STEM跨学科教育具有前景,但目前使用提示工程(PE)、监督微调(SFT)或标准强化学习(RL)的方法往往无法支持这一范式的转变。现有方法受到三个基本挑战的阻碍:无法动态建模学生的潜在认知状态;长期教育目标所固有的严重奖励稀疏性和延迟问题;以及由于依赖行为克隆而缺乏战略多样性的策略崩溃倾向。鉴于这些互动不可观察且具有复杂的动态变化,我们正式将苏格拉底式跨学科教学问题(SIIP)定义为一个结构化的部分可观测马尔可夫决策过程(POMDP),要求同时进行全局探索和精细策略调整。 为此,我们提出了ERL4SIIP,这是一种专门为这一领域设计的新型进化强化学习(ERL)框架。ERL4SIIP集成了以下三个关键组件: 1. **动态学生模拟器**:基于STEM知识图谱构建,用于潜在状态建模。 2. **分层奖励机制**:将长期目标分解为密集信号,以促进更有效的学习过程。 3. **LoRA-Division优化策略**:结合进化算法进行群体级全局搜索和PPO(Proximal Policy Optimization)进行局部梯度上升。 ERL4SIIP框架旨在通过动态模拟学生的认知状态、提供即时且详细的反馈,并利用先进的优化技术来解决上述挑战,从而为STEM教育中的高层次认知能力培养提供有效的支持。
https://arxiv.org/abs/2512.11930
This paper presents a psychologically-aware conversational agent designed to enhance both learning performance and emotional well-being in educational settings. The system combines Large Language Models (LLMs), a knowledge graph-enhanced BERT (KG-BERT), and a bidirectional Long Short-Term Memory (LSTM) with attention to classify students' cognitive and affective states in real time. Unlike prior chatbots limited to either tutoring or affective support, our approach leverages multimodal data-including textual semantics, prosodic speech features, and temporal behavioral trends-to infer engagement, stress, and conceptual understanding. A pilot study with university students demonstrated improved motivation, reduced stress, and moderate academic gains compared to baseline methods. These results underline the promise of integrating semantic reasoning, multimodal fusion, and temporal modeling to support adaptive, student-centered educational interventions.
本文介绍了一种具备心理感知能力的对话代理,旨在提升教育环境中学生的学习表现和情感福祉。该系统结合了大型语言模型(LLMs)、知识图谱增强版的BERT(KG-BERT)以及双向长短时记忆网络(LSTM)与注意力机制,以实时分类学生的认知状态和情绪状态。不同于以往仅限于辅导或情感支持的聊天机器人,我们的方法利用多模态数据——包括文本语义、韵律语音特征及时间行为趋势——来推断学生参与度、压力水平以及概念理解情况。 一项针对大学生进行的试点研究表明,在与基准方法相比时,该系统可以提高学生的积极性、减少他们的压力,并带来中等程度的成绩提升。这些结果表明了将语义推理、多模态融合和时间建模相结合以支持适应性、学生中心化教育干预措施的巨大潜力。
https://arxiv.org/abs/2512.10441
Large language models (LLMs) like Claude, Mistral IA, and GPT-4 excel in NLP but lack structured knowledge, leading to factual inconsistencies. We address this by integrating Knowledge Graphs (KGs) via KG-BERT to enhance grounding and reasoning. Experiments show significant gains in knowledge-intensive tasks such as question answering and entity linking. This approach improves factual reliability and enables more context-aware next-generation LLMs.
大型语言模型(如Claude、Mistral IA和GPT-4)在自然语言处理方面表现出色,但缺乏结构化知识,导致事实上的不一致性。为了解决这一问题,我们通过将知识图谱(KG-BERT)集成到这些模型中来增强其基于证据的事实确认和推理能力。实验表明,在问答和实体链接等需要大量知识的任务上取得了显著的进步。这种方法提高了事实的可靠性,并使下一代语言模型能够更好地理解和利用上下文信息。
https://arxiv.org/abs/2512.10440
Personalizing Large Language Model (LLM) agents requires conditioning them on user-specific data, creating a critical trade-off between task utility and data disclosure. While the utility of adding user data often exhibits diminishing returns (i.e., submodularity), enabling near-optimal greedy selection, real-world personalization is complicated by structural constraints. These include logical dependencies (e.g., selecting fact A requires fact B), categorical quotas (e.g., select at most one writing style), and hierarchical rules (e.g., select at most two social media preferences, of which at most one can be for a professional network). These constraints violate the assumptions of standard subset selection algorithms. We propose a principled method to formally model such constraints. We introduce a compilation process that transforms a user's knowledge graph with dependencies into a set of abstract macro-facets. Our central result is a proof that common hierarchical and quota-based constraints over these macro-facets form a valid laminar matroid. This theoretical characterization lets us cast structured personalization as submodular maximization under a matroid constraint, enabling greedy with constant-factor guarantees (and (1-1/e) via continuous greedy) for a much richer and more realistic class of problems.
个性化大型语言模型(LLM)代理需要通过用户特定数据对其进行条件化处理,这在任务效用和数据披露之间产生了一种关键的权衡。虽然添加用户数据通常会导致效益递减(即次模性),使得近似最优的选择可以通过贪婪算法实现,但实际中的个性化过程却因结构性约束而变得更加复杂。这些约束包括逻辑依赖关系(例如,选择事实A需要先选择事实B)、类别配额(例如,最多选择一种写作风格)以及层级规则(例如,在社交媒体偏好中最多选择两项,其中只能有一项是针对专业网络的)。这些约束违背了标准子集选择算法的基本假设。 我们提出了一种有原则的方法来正式建模此类约束。我们引入了一个编译过程,将用户知识图中的依赖关系转换为一组抽象的宏属性(macro-facets)。我们的核心成果是一个证明,表明在这些宏属性上的常见层级和基于配额的约束形成了一个有效的层状矩阵(laminar matroid)。这种理论特性使我们可以将结构化个性化问题看作是在矩阵约束下的次模最大化问题,从而能够利用贪婪算法提供有保证的常数因子结果(通过连续贪婪可获得1-1/e的结果),适用于更加丰富和现实的问题类别。
https://arxiv.org/abs/2512.11907
Knowledge Base Question Answering (KBQA) challenges models to bridge the gap between natural language and strict knowledge graph schemas by generating executable logical forms. While Large Language Models (LLMs) have advanced this field, current approaches often struggle with a dichotomy of failure: they either generate hallucinated queries without verifying schema existence or exhibit rigid, template-based reasoning that mimics synthesized traces without true comprehension of the environment. To address these limitations, we present \textbf{KBQA-R1}, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning. Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions, leveraging Group Relative Policy Optimization (GRPO) to refine its strategies based on concrete execution feedback rather than static supervision. Furthermore, we introduce \textbf{Referenced Rejection Sampling (RRS)}, a data synthesis method that resolves cold-start challenges by strictly aligning reasoning traces with ground-truth action sequences. Extensive experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance, effectively grounding LLM reasoning in verifiable execution.
知识库问答(KBQA)挑战模型通过生成可执行的逻辑形式来弥合自然语言与严格的知识图谱模式之间的差距。尽管大型语言模型(LLMs)在这一领域取得了进展,但目前的方法往往面临着二元失败的问题:要么生成未验证模式存在的虚假查询,要么表现出僵化的基于模板的推理方式,模仿合成轨迹而不是真正理解环境。为了解决这些局限性,我们提出了**KBQA-R1**框架,该框架通过强化学习将范式从文本模仿转向交互优化。我们将KBQA视为一个多轮决策过程,模型学会使用一组动作在知识库中导航,并利用群组相对策略优化(GRPO)根据具体的执行反馈而不是静态监督来调整其策略。此外,我们引入了**参考拒绝采样(RRS)**,一种数据合成方法,通过严格对齐推理轨迹与真实的动作序列解决冷启动挑战问题。在WebQSP、GrailQA和GraphQuestions上的广泛实验表明,KBQA-R1实现了最先进的性能,有效地将LLM的推理固定在可验证执行的基础上。
https://arxiv.org/abs/2512.10999
Recent advances in large language models (LLMs) have enabled strong reasoning over both structured and unstructured knowledge. When grounded on knowledge graphs (KGs), however, prevailing pipelines rely on heavy neural encoders to embed and score symbolic paths or on repeated LLM calls to rank candidates, leading to high latency, GPU cost, and opaque decisions that hinder faithful, scalable deployment. We propose PathHD, a lightweight and encoder-free KG reasoning framework that replaces neural path scoring with hyperdimensional computing (HDC) and uses only a single LLM call per query. PathHD encodes relation paths into block-diagonal GHRR hypervectors, ranks candidates with blockwise cosine similarity and Top-K pruning, and then performs a one-shot LLM adjudication to produce the final answer together with cited supporting paths. Technically, PathHD is built on three ingredients: (i) an order-aware, non-commutative binding operator for path composition, (ii) a calibrated similarity for robust hypervector-based retrieval, and (iii) a one-shot adjudication step that preserves interpretability while eliminating per-path LLM scoring. On WebQSP, CWQ, and the GrailQA split, PathHD (i) attains comparable or better Hits@1 than strong neural baselines while using one LLM call per query; (ii) reduces end-to-end latency by $40-60\%$ and GPU memory by $3-5\times$ thanks to encoder-free retrieval; and (iii) delivers faithful, path-grounded rationales that improve error diagnosis and controllability. These results indicate that carefully designed HDC representations provide a practical substrate for efficient KG-LLM reasoning, offering a favorable accuracy-efficiency-interpretability trade-off.
最近在大型语言模型(LLMs)方面的进展使得它们能够对结构化和非结构化的知识进行强大的推理。然而,当这些模型基于知识图谱(KGs)时,现有的流程依赖于重大的神经编码器来嵌入并评估符号路径,或者通过多次调用LLM来排名候选答案,这导致了高延迟、GPU成本增加以及不透明的决策过程,阻碍了其忠实且可扩展的应用。为此,我们提出了PathHD,这是一种轻量级且无需编码器的知识图谱推理框架,它用超维计算(HDC)替代神经路径评分,并对每个查询仅进行一次LLM调用。 PathHD通过将关系路径编码为块对角GHRR超向量,利用逐块余弦相似度和Top-K剪枝来排名候选答案,然后执行一步式LLM裁决以生成最终的答案以及引用的支持路径。从技术角度来看,PathHD基于三个关键组件:(i) 一个顺序感知且非交换的绑定操作符用于路径组合;(ii) 经过校准后的相似度指标以实现稳健的超向量检索;(iii) 这一步式裁决步骤保持了可解释性,并消除了每条路径都需要进行LLM评分的问题。 在WebQSP、CWQ以及GrailQA分割数据集上,PathHD实现了以下成果:(i) 在仅使用一次LLM调用的情况下,与强大的神经基线相比,在Hits@1指标上有相当或更好的表现;(ii) 由于无需编码器的检索机制,整体延迟减少了40-60%,GPU内存消耗降低了3到5倍;(iii) 提供忠实、基于路径的理由解释,这有助于改善错误诊断和可控性。 这些结果表明,精心设计的HDC表示为高效的知识图谱-LLM推理提供了一个实用的基础,并实现了准确度、效率和可解释性的良好权衡。
https://arxiv.org/abs/2512.09369
Graph-based Retrieval-Augmented Generation (GraphRAG) enhances Large Language Models (LLMs) by incorporating external knowledge from linearized subgraphs retrieved from knowledge graphs. However, LLMs struggle to interpret the relational and topological information in these inputs, resulting in hallucinations that are inconsistent with the retrieved knowledge. To analyze how LLMs attend to and retain structured knowledge during generation, we propose two lightweight interpretability metrics: Path Reliance Degree (PRD), which measures over-reliance on shortest-path triples, and Semantic Alignment Score (SAS), which assesses how well the model's internal representations align with the retrieved knowledge. Through empirical analysis on a knowledge-based QA task, we identify failure patterns associated with over-reliance on salient paths and weak semantic grounding, as indicated by high PRD and low SAS scores. We further develop a lightweight post-hoc hallucination detector, Graph Grounding and Alignment (GGA), which outperforms strong semantic and confidence-based baselines across AUC and F1. By grounding hallucination analysis in mechanistic interpretability, our work offers insights into how structural limitations in LLMs contribute to hallucinations, informing the design of more reliable GraphRAG systems in the future.
基于图的检索增强生成(GraphRAG)通过从知识图中提取线性化子图并将其整合到大型语言模型(LLMs)中,来提升LLMs的能力。然而,LLMs在解释这些输入中的关系和拓扑信息方面存在困难,导致与检索知识不一致的幻觉现象出现。为了分析LLMs在生成过程中如何关注和保持结构化知识,我们提出两种轻量级可解释性指标:路径依赖度(PRD),用于衡量对最短路径三元组过度依赖的程度;语义一致性评分(SAS),评估模型内部表示与检索到的知识的吻合程度。通过在基于知识的问题回答任务上的实证分析,我们识别出了由于高PRD和低SAS得分所指示的过度依赖显著路径及弱语义基础所带来的失败模式。 此外,我们还开发了一种轻量级的事后幻觉检测器——图接地与对齐(GGA),该方法在AUC和F1指标上优于强大的语义和置信度基准。通过将幻觉分析建立在机制解释性基础上,我们的工作揭示了LLMs中的结构性限制如何导致幻觉,并为未来设计更可靠的GraphRAG系统提供了洞见。
https://arxiv.org/abs/2512.09148
Accurate interpretation of pediatric dental clinical records and safe antibiotic prescribing remain persistent challenges in dental informatics. Traditional rule-based clinical decision support systems struggle with unstructured dental narratives, incomplete radiographic descriptions, and complex safety constraints. To address these limitations, this study proposes a Knowledge-Guided Large Language Model (KG-LLM) that integrates a pediatric dental knowledge graph, retrieval-augmented generation (RAG), and a multi-stage safety validation pipeline for evidence-grounded antibiotic recommendation. The framework first employs a clinical NER/RE module to extract structured entities and relations from dental notes and radiology reports. Relevant guidelines, drug-safety rules, and analogous historical cases are subsequently retrieved from the knowledge graph and supplied to the LLM for diagnostic summarization and dose-drug-duration prediction. Safety assurance is achieved through a dual-layer validation mechanism combining deterministic rule checking with a learned classifier for detecting allergies, contraindications, and dosing errors. Experiments on 32,000 de-identified pediatric dental visit records demonstrate the effectiveness of the proposed approach. Compared with a domain-adapted Llama-2 clinical baseline, KG-LLM improves record-understanding performance (F1: 0.914 vs. 0.867), drug-dose-duration accuracy (Top-1: 0.782 vs. 0.716), and reduces unsafe antibiotic suggestions by 50%. Additional evaluation across summary quality, recommendation accuracy, and global safety scores further confirms the robustness of the system. Ablation analyses indicate that the knowledge graph, RAG, and safety modules each contribute substantially to clinical reliability and interpretability.
准确解读儿科牙科临床记录和安全使用抗生素的处方仍然是牙科信息学中的持续挑战。传统的基于规则的临床决策支持系统在处理无结构化的牙科叙述、不完整的放射描述以及复杂的安全约束方面存在困难。为了应对这些限制,本研究提出了一种知识引导的大规模语言模型(KG-LLM),该模型整合了儿科牙科学知识图谱、检索增强生成(RAG)和一个多阶段安全验证管道,用于基于证据的抗生素推荐。 框架首先使用临床命名实体识别/关系抽取模块从牙科笔记和放射报告中提取结构化实体和关系。随后,从知识图谱中检索出相关的指南、药物安全性规则以及类似的历史案例,并提供给大规模语言模型(LLM)进行诊断总结和剂量-用药时间预测。安全保证是通过结合确定性规则检查与学习分类器的双重层验证机制来实现的,以检测过敏反应、禁忌症及剂量错误。 在32,000份去标识化的儿科牙科访问记录上进行了实验,证明了所提出方法的有效性。相比领域自适应的Llama-2临床基线模型,KG-LLM提高了记录理解性能(F1值:0.914对0.867)、药物剂量时间准确性(Top-1:0.782对0.716),并减少了50%的安全性问题抗生素建议。跨总结质量、推荐准确性和全局安全性评分的额外评估进一步证实了系统的稳健性。消融分析表明,知识图谱、RAG和安全模块各自在临床可靠性和可解释性方面做出了重要贡献。
https://arxiv.org/abs/2512.09127
Ontology-based knowledge graph (KG) construction is a core technology that enables multidimensional understanding and advanced reasoning over domain knowledge. Industrial standards, in particular, contain extensive technical information and complex rules presented in highly structured formats that combine tables, scopes of application, constraints, exceptions, and numerical calculations, making KG construction especially challenging. In this study, we propose a method that organizes such documents into a hierarchical semantic structure, decomposes sentences and tables into atomic propositions derived from conditional and numerical rules, and integrates them into an ontology-knowledge graph through LLM-based triple extraction. Our approach captures both the hierarchical and logical structures of documents, effectively representing domain-specific semantics that conventional methods fail to reflect. To verify its effectiveness, we constructed rule, table, and multi-hop QA datasets, as well as a toxic clause detection dataset, from industrial standards, and implemented an ontology-aware KG-RAG framework for comparative evaluation. Experimental results show that our method achieves significant performance improvements across all QA types compared to existing KG-RAG approaches. This study demonstrates that reliable and scalable knowledge representation is feasible even for industrial documents with intertwined conditions, constraints, and scopes, contributing to future domain-specific RAG development and intelligent document management.
基于本体的知识图谱(KG)构建是一种核心技术,它能够实现对领域知识的多维度理解和高级推理。特别是在工业标准中,含有大量的技术信息和以高度结构化格式呈现的复杂规则,这种格式结合了表格、应用范围、约束条件、例外情况以及数值计算等内容,使得知识图谱的构建变得尤为具有挑战性。在本研究中,我们提出了一种方法,将此类文档组织成层次化的语义结构,并通过基于大型语言模型(LLM)的三元组提取技术将其句子和表格分解为原子命题,这些命题来源于条件性和数值性的规则,然后将它们集成到一个本体-知识图谱中。我们的方法能够有效捕捉文档中的层级结构和逻辑结构,准确地表示出传统方法无法反映的专业领域语义。 为了验证其有效性,我们从工业标准中构建了规则、表格以及多跳问答(QA)数据集,同时还建立了一个有害条款检测的数据集,并实施了一个基于本体的知识图谱-检索与生成框架(KG-RAG),用于进行比较评估。实验结果显示,在所有类型的问答上,我们的方法相比于现有的KG-RAG方法均取得了显著的性能提升。 这项研究证明了即使对于包含交织条件、约束和适用范围的工业文档而言,实现可靠且可扩展的知识表示也是可行的。这将有助于未来特定领域的RAG开发以及智能文档管理的进步。
https://arxiv.org/abs/2512.08398
Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patient awareness, facilitate early healthcare engagement, and improve healthcare system efficiency. However, existing approaches encounter critical challenges, including imbalanced disease distributions and a lack of interpretability, resulting in biased or unreliable predictions. To address these issues, we propose the Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework. KPI systematically integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy, which is particularly important for long-tailed diseases. Additionally, KPI utilizes large language models (LLMs) to generate patient-specific, medically relevant explanations, thereby improving interpretability and reliability. Extensive experiments on real-world datasets demonstrate that KPI outperforms state-of-the-art methods in predictive accuracy and provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
仅从患者信息(如人口统计学数据和自我报告的症状)来预测疾病,由于其潜在地能够提高患者的意识、促进早期医疗参与以及提升医疗系统的效率,这一方法已经吸引了大量研究关注。然而,现有的方法遇到了一些关键挑战,包括不平衡的疾病分布和缺乏可解释性,这些问题导致了偏斜或不可靠的预测结果。为了应对这些挑战,我们提出了一个知识图谱增强型、原型感知型且具有解释性的框架(KPI)。该框架系统地将结构化和可信的医学知识整合到统一的疾病知识图中,构建临床意义明确的疾病原型,并采用对比学习来提高预测准确性,特别是在处理长尾疾病的预测问题时尤为重要。此外,KPI还利用大型语言模型(LLM)生成针对每位患者的、与医疗相关的解释性说明,以此提升其可解释性和可靠性。在现实世界数据集上的广泛实验表明,KPI在预测准确度上超越了现有的最佳方法,并提供了与患者故事紧密相连的临床有效解释,突显了它在以患者为中心的医疗服务交付中的实用价值。
https://arxiv.org/abs/2512.08261