This paper presents a novel research analytical IT system grounded in Martin Heidegger's Fundamental Ontology, distinguishing between beings (das Seiende) and Being (das Sein). The system employs two modally distinct, descriptively complete languages: a categorical language of beings for processing user inputs and an existential language of Being for internal analysis. These languages are bridged via a phenomenological reduction module, enabling the system to analyze user queries (including questions, answers, and dialogues among IT specialists), identify recursive and self-referential structures, and provide actionable insights in categorical terms. Unlike contemporary systems limited to categorical analysis, this approach leverages Heidegger's phenomenological existential analysis to uncover deeper ontological patterns in query processing, aiding in resolving logical traps in complex interactions, such as metaphor usage in IT contexts. The path to full realization involves formalizing the language of Being by a research team based on Heidegger's Fundamental Ontology; given the existing completeness of the language of beings, this reduces the system's computability to completeness, paving the way for a universal query analysis tool. The paper presents the system's architecture, operational principles, technical implementation, use cases--including a case based on real IT specialist dialogues--comparative evaluation with existing tools, and its advantages and limitations.
本文介绍了一种基于马丁·海德格尔基础存在论的新颖研究分析信息系统,该系统区分了“实存”(das Seiende)和“存在”(das Sein)。此系统使用两种模态上不同的、描述完整的语言:一种是处理用户输入的关于实存的概念性语言,另一种则是用于内部分析的存在论语言。这两种语言通过现象学还原模块相连通,使系统能够分析用户的查询(包括问题、回答以及IT专家间的对话),识别递归和自我参照结构,并以概念术语提供可操作性的见解。与仅限于类别分析的当代系统不同,这种方法利用海德格尔的现象学存在论分析来揭示在查询处理中更为深层的存在论模式,在解决复杂互动中的逻辑陷阱方面尤为有用,例如信息技术背景下的隐喻使用问题。实现这一系统的完整路径在于由研究团队基于海德格尔的基础存在论对“存在”的语言进行形式化;鉴于实存的语言已具备完整性,这将把系统计算能力的完备性降至最低,从而为通用查询分析工具开辟道路。 本文详细介绍了该系统的架构、运作原理、技术实现、应用场景(包括一个基于真实IT专家对话案例)、与现有工具的比较评估以及其优势和局限性。
https://arxiv.org/abs/2504.12977
Chinese-Vicuna is an open-source, resource-efficient language model designed to bridge the gap in Chinese instruction-following capabilities by fine-tuning Meta's LLaMA architecture using Low-Rank Adaptation (LoRA). Targeting low-resource environments, it enables cost-effective deployment on consumer GPUs (e.g., RTX-2080Ti for 7B models) and supports domain-specific adaptation in fields like healthcare and law. By integrating hybrid datasets (BELLE and Guanaco) and 4-bit quantization (QLoRA), the model achieves competitive performance in tasks such as translation, code generation, and domain-specific Q\&A. The project provides a comprehensive toolkit for model conversion, CPU inference, and multi-turn dialogue interfaces, emphasizing accessibility for researchers and developers. Evaluations indicate competitive performance across medical tasks, multi-turn dialogue coherence, and real-time legal updates. Chinese-Vicuna's modular design, open-source ecosystem, and community-driven enhancements position it as a versatile foundation for Chinese LLM applications.
Chinese-Vicuna 是一个开源、资源高效的语言模型,旨在通过使用低秩适应(LoRA)技术对 Meta 的 LLaMA 架构进行微调,来弥补中文指令跟随能力的不足。它针对计算资源有限的环境而设计,可以在消费级 GPU(例如 RTX-2080Ti 上运行 7B 模型)上以低成本部署,并支持医疗和法律等领域的特定领域适应性。 通过整合混合数据集(如 BELLE 和 Guanaco)以及采用4位量化(QLoRA),该模型在诸如翻译、代码生成及特定领域的问答任务中表现出竞争力的性能。该项目提供了一整套工具包,涵盖模型转换、CPU 推断和多轮对话接口等功能,旨在为研究人员和开发人员提供高度可访问性。 评估结果表明,Chinese-Vicuna 在医疗任务、多轮对话连贯性和实时法律更新等方面都达到了竞争性的表现水平。凭借模块化设计、开源生态系统及社区驱动的增强功能,Chinese-Vicuna 作为中文大型语言模型应用的基础平台而具备极高的灵活性和适用性。
https://arxiv.org/abs/2504.12737
We present KODIS, a dyadic dispute resolution corpus containing thousands of dialogues from over 75 countries. Motivated by a theoretical model of culture and conflict, participants engage in a typical customer service dispute designed by experts to evoke strong emotions and conflict. The corpus contains a rich set of dispositional, process, and outcome measures. The initial analysis supports theories of how anger expressions lead to escalatory spirals and highlights cultural differences in emotional expression. We make this corpus and data collection framework available to the community.
我们介绍了KODIS,这是一个包含来自75多个国家的数千个对话的双边纠纷解决语料库。受文化与冲突理论模型的启发,参与者在专家设计的一系列典型的客户服务争议情境中进行互动,这些情境旨在引发强烈的情绪和冲突。该语料库包含了丰富的人格特质、过程和结果测量指标。初步分析支持了愤怒表达导致升级螺旋的理论,并突显了不同文化之间的情感表达差异。我们将这一语料库及其数据收集框架提供给社区使用。
https://arxiv.org/abs/2504.12723
Movie Audio Description (AD) aims to narrate visual content during dialogue-free segments, particularly benefiting blind and visually impaired (BVI) audiences. Compared with general video captioning, AD demands plot-relevant narration with explicit character name references, posing unique challenges in movie this http URL identify active main characters and focus on storyline-relevant regions, we propose FocusedAD, a novel framework that delivers character-centric movie audio descriptions. It includes: (i) a Character Perception Module(CPM) for tracking character regions and linking them to names; (ii) a Dynamic Prior Module(DPM) that injects contextual cues from prior ADs and subtitles via learnable soft prompts; and (iii) a Focused Caption Module(FCM) that generates narrations enriched with plot-relevant details and named characters. To overcome limitations in character identification, we also introduce an automated pipeline for building character query banks. FocusedAD achieves state-of-the-art performance on multiple benchmarks, including strong zero-shot results on MAD-eval-Named and our newly proposed Cinepile-AD dataset. Code and data will be released at this https URL .
电影音频描述(AD)旨在通过叙述视觉内容来帮助盲人和视力受损者(BVI)在无对话的片段中更好地理解画面。相比一般的视频字幕,AD需要提供与剧情相关且明确指明角色名称的叙述,这为电影带来了独特的挑战。 为了识别活跃的主要角色并专注于与剧情相关的区域,我们提出了FocusedAD这一创新框架,它提供了以人物为中心的电影音频描述。该框架包括以下部分: (i) 角色感知模块(CPM),用于跟踪角色所在的画面区域,并将其链接到对应的名字; (ii) 动态先验模块(DPM),通过可学习的软提示从之前的AD和字幕中注入上下文线索; (iii) 集中描述模块(FCM),生成包含与剧情相关的细节及命名人物的叙述。 为了克服角色识别上的局限性,我们还引入了一种自动化流程来构建角色查询库。 在多个基准测试上,包括在MAD-eval-Named和新提出的Cinepile-AD数据集中的零样本结果中,FocusedAD达到了最先进的性能水平。代码和数据将在该链接(假设提供了一个URL)发布。
https://arxiv.org/abs/2504.12157
In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, images, tables, and other modalities. We demonstrate the system's capability to enhance response precision by leveraging a robust question-answering model, significantly improving the quality of dialogue generation. The system provides an accessible platform for real-time, high-fidelity interactions, allowing users to benefit from efficient human-computer interaction, precise retrieval, and simultaneous access to a wide range of literature and data. This dramatically improves the research efficiency of professionals in the biomedical and pharmaceutical domains and facilitates faster, more informed decision-making throughout the R\&D process. Furthermore, the system proposed in this paper is available at this https URL.
在这篇论文中,我们提出了一种创新系统,该系统将最先进的、领域特定的大规模语言模型与高级信息检索技术相结合,以提供全面且上下文相关的响应。我们的方法促进了不同组件之间的无缝交互,并通过交叉验证输出来生成准确度高且质量优良的响应,这些响应丰富了相关数据、图像、表格及其他模式的内容。我们展示了该系统如何利用一个强大的问答模型来提高响应精度,从而显著提升对话生成的质量。该系统提供了一个易于访问的平台,支持实时且高度逼真的交互,使用户能够从高效的“人机互动”中受益,并同时获取大量文献和数据,极大地提升了生物医学和制药领域专业人士的研究效率,并在整个研发过程中促进了更快、更明智的决策制定。此外,在这篇论文中提出的系统可在此[URL]访问。
https://arxiv.org/abs/2504.12341
Large language models (LLMs) hold great promise for assisting clinical interviews due to their fluent interactive capabilities and extensive medical knowledge. However, the lack of high-quality interview dialogue data and widely accepted evaluation methods has significantly impeded this process. So we propose CliniChat, a framework that integrates multi-source knowledge to enable LLMs to simulate real-world clinical interviews. It consists of two modules: Clini-Recon and Clini-Eval, each responsible for reconstructing and evaluating interview dialogues, respectively. By incorporating three sources of knowledge, Clini-Recon transforms clinical notes into systematic, professional, and empathetic interview dialogues. Clini-Eval combines a comprehensive evaluation metric system with a two-phase automatic evaluation approach, enabling LLMs to assess interview performance like experts. We contribute MedQA-Dialog, a high-quality synthetic interview dialogue dataset, and CliniChatGLM, a model specialized for clinical interviews. Experimental results demonstrate that CliniChatGLM's interview capabilities undergo a comprehensive upgrade, particularly in history-taking, achieving state-of-the-art performance.
大型语言模型(LLMs)由于其流畅的交互能力和丰富的医学知识,在辅助临床访谈方面展现出巨大潜力。然而,缺乏高质量的对话数据和广泛接受的评估方法严重阻碍了这一进程。因此,我们提出了CliniChat框架,该框架整合多源知识,使LLMs能够模拟真实世界的临床访谈。CliniChat包括两个模块:Clini-Recon和Clini-Eval,分别负责重建和评价访谈对话。 通过融合三种类型的知识,Clini-Recon将临床笔记转化为系统化、专业且富有同情心的访谈对话。而Clini-Eval结合了全面的评估指标体系与两阶段自动评估方法,使LLMs能够像专家一样评估访谈表现。 我们贡献了一个高质量的合成访谈对话数据集MedQA-Dialog和一个专用于临床访谈的模型CliniChatGLM。实验结果表明,CliniChatGLM的访谈能力得到了全面提升,尤其是在病史采集方面达到了最先进的性能水平。
https://arxiv.org/abs/2504.10418
Traditional text-based person ReID assumes that person descriptions from witnesses are complete and provided at once. However, in real-world scenarios, such descriptions are often partial or vague. To address this limitation, we introduce a new task called interactive person re-identification (Inter-ReID). Inter-ReID is a dialogue-based retrieval task that iteratively refines initial descriptions through ongoing interactions with the witnesses. To facilitate the study of this new task, we construct a dialogue dataset that incorporates multiple types of questions by decomposing fine-grained attributes of individuals. We further propose LLaVA-ReID, a question model that generates targeted questions based on visual and textual contexts to elicit additional details about the target person. Leveraging a looking-forward strategy, we prioritize the most informative questions as supervision during training. Experimental results on both Inter-ReID and text-based ReID benchmarks demonstrate that LLaVA-ReID significantly outperforms baselines.
传统的基于文本的人再识别(ReID)任务假设目击者提供的描述是完整且一次性给出的。然而,在现实场景中,这些描述往往不完整或模糊不清。为了解决这一局限性,我们引入了一个新的任务——互动式人再识别(Inter-ReID)。Inter-ReID 是一种基于对话的检索任务,通过与目击者的持续交互来逐步完善初始描述。 为了促进这项新任务的研究,我们构建了一个包含多种类型问题的对话数据集,这些问题通过对个体进行细粒度属性分解而生成。此外,我们还提出了一种名为LLaVA-ReID的问题模型,该模型基于视觉和文本上下文生成有针对性的问题,以获取关于目标人物的额外信息。通过采用面向未来的策略,在训练过程中优先选择最具有信息量的问题作为监督。 在交互式ReID和传统基于文本的ReID基准测试上的实验结果表明,LLaVA-ReID显著优于现有基线方法。
https://arxiv.org/abs/2504.10174
The rise of LLM-driven AI characters raises safety concerns, particularly for vulnerable human users with psychological disorders. To address these risks, we propose EmoAgent, a multi-agent AI framework designed to evaluate and mitigate mental health hazards in human-AI interactions. EmoAgent comprises two components: EmoEval simulates virtual users, including those portraying mentally vulnerable individuals, to assess mental health changes before and after interactions with AI characters. It uses clinically proven psychological and psychiatric assessment tools (PHQ-9, PDI, PANSS) to evaluate mental risks induced by LLM. EmoGuard serves as an intermediary, monitoring users' mental status, predicting potential harm, and providing corrective feedback to mitigate risks. Experiments conducted in popular character-based chatbots show that emotionally engaging dialogues can lead to psychological deterioration in vulnerable users, with mental state deterioration in more than 34.4% of the simulations. EmoGuard significantly reduces these deterioration rates, underscoring its role in ensuring safer AI-human interactions. Our code is available at: this https URL
大型语言模型驱动的AI角色的兴起引发了安全问题,尤其是对于有心理障碍的脆弱用户。为应对这些风险,我们提出了EmoAgent——一个多代理人工智能框架,旨在评估和缓解人类与AI互动中的心理健康危害。EmoAgent由两个组成部分构成:EmoEval模拟虚拟用户,包括那些代表精神脆弱个体的人群,以评估在与AI角色交互前后心理健康的变化情况。它使用临床验证的心理学和精神病学评估工具(如PHQ-9、PDI和PANSS)来评价LLM引发的精神风险。 另一个组件是EmoGuard,作为中介监控用户的心理状态,预测潜在的危害,并提供纠正性反馈以降低风险。在流行的角色型聊天机器人中进行的实验表明,具有情感吸引力的对话可能会导致脆弱用户的心理健康状况恶化,在超过34.4%的模拟案例中观察到了这一现象。EmoGuard显著降低了这些恶化的比率,强调了其确保更安全的人机互动方面的作用。 我们的代码可在以下网址获取:this https URL
https://arxiv.org/abs/2504.09689
This study addresses the challenge of ambiguity in knowledge graph question answering (KGQA). While recent KGQA systems have made significant progress, particularly with the integration of large language models (LLMs), they typically assume user queries are unambiguous, which is an assumption that rarely holds in real-world applications. To address these limitations, we propose a novel framework that dynamically handles both entity ambiguity (e.g., distinguishing between entities with similar names) and intent ambiguity (e.g., clarifying different interpretations of user queries) through interactive clarification. Our approach employs a Bayesian inference mechanism to quantify query ambiguity and guide LLMs in determining when and how to request clarification from users within a multi-turn dialogue framework. We further develop a two-agent interaction framework where an LLM-based user simulator enables iterative refinement of logical forms through simulated user feedback. Experimental results on the WebQSP and CWQ dataset demonstrate that our method significantly improves performance by effectively resolving semantic ambiguities. Additionally, we contribute a refined dataset of disambiguated queries, derived from interaction histories, to facilitate future research in this direction.
这项研究解决了知识图谱问答(KGQA)中模糊性的挑战。尽管最近的KGQA系统在与大型语言模型(LLMs)集成方面取得了显著进展,但它们通常假设用户的查询是明确无误的,而在实际应用中这一假设很少成立。为了解决这些限制,我们提出了一种新颖的框架,该框架能够动态地处理实体模糊性(例如,区分具有相似名称的不同实体)和意图模糊性(例如,澄清用户查询的不同解释),并通过互动澄清来解决这些问题。 我们的方法采用贝叶斯推理机制来量化查询的模糊性,并指导LLMs在多轮对话框架内确定何时以及如何向用户提供澄清请求。此外,我们还开发了一个双代理交互框架,在此框架中,基于LLM的用户模拟器通过模拟用户的反馈来进行逻辑形式的迭代细化。 在WebQSP和CWQ数据集上的实验结果表明,我们的方法通过有效地解决语义模糊性显著提高了性能。另外,我们贡献了一组从互动历史中提炼出的、经过消歧处理的查询数据集,以促进在此方向上的未来研究。
https://arxiv.org/abs/2504.09665
Current AI systems based on probabilistic neural networks, such as large language models (LLMs), have demonstrated remarkable generative capabilities yet face critical challenges including hallucination, unpredictability, and misalignment with human decision-making. These issues fundamentally stem from the over-reliance on randomized (probabilistic) neural networks-oversimplified models of biological neural networks-while neglecting the role of procedural reasoning (chain-of-thought) in trustworthy decision-making. Inspired by the human cognitive duality of fluid intelligence (flexible generation) and crystallized intelligence (structured knowledge), this study proposes a dual-channel intelligent architecture that integrates probabilistic generation (LLMs) with white-box procedural reasoning (chain-of-thought) to construct interpretable, continuously learnable, and human-aligned AI systems. Concretely, this work: (1) redefines chain-of-thought as a programmable crystallized intelligence carrier, enabling dynamic knowledge evolution and decision verification through multi-turn interaction frameworks; (2) introduces a task-driven modular network design that explicitly demarcates the functional boundaries between randomized generation and procedural control to address trustworthiness in vertical-domain applications; (3) demonstrates that multi-turn interaction is a necessary condition for intelligence emergence, with dialogue depth positively correlating with the system's human-alignment degree. This research not only establishes a new paradigm for trustworthy AI deployment but also provides theoretical foundations for next-generation human-AI collaborative systems.
当前基于概率神经网络的人工智能系统,如大型语言模型(LLMs),展示了出色的生成能力,但同时也面临着包括幻觉、不可预测性和与人类决策不一致等关键挑战。这些问题的根本原因在于过度依赖于随机化(概率性)的神经网络——这些是生物神经网络过于简化的模型,并忽视了程序推理(思维链)在可信决策中的作用。受人类认知双重性的启发,即流体智能(灵活生成)与晶体智能(结构化知识),这项研究提出了一种双通道智能架构,将概率性生成(LLMs)与白盒程序推理(思维链)相结合,构建出可解释的、持续学习的和与人一致的人工智能系统。具体而言,本工作: 1. 将“思维链”重新定义为一种可编程晶体智能载体,通过多轮交互框架实现知识动态演化和决策验证; 2. 引入任务驱动模块化网络设计,明确区分随机生成与程序控制之间的功能界限,以解决垂直领域应用中的可信性问题; 3. 证明了多轮互动是智能出现的必要条件,并且对话深度正向关联于系统的人类一致度。 这项研究不仅为值得信赖的人工智能部署建立了一个新的范式,而且还为人机协作系统的下一代理论基础提供了依据。
https://arxiv.org/abs/2504.09301
Existing computer vision(CV)-based structural damage identification models demonstrate notable accuracy in categorizing and localizing damage. However, these models present several critical limitations that hinder their practical application in civil engineering(CE). Primarily, their ability to recognize damage types remains constrained, preventing comprehensive analysis of the highly varied and complex conditions encountered in real-world CE structures. Second, these models lack linguistic capabilities, rendering them unable to articulate structural damage characteristics through natural language descriptions. With the continuous advancement of artificial intelligence(AI), large multi-modal models(LMMs) have emerged as a transformative solution, enabling the unified encoding and alignment of textual and visual data. These models can autonomously generate detailed descriptive narratives of structural damage while demonstrating robust generalization across diverse scenarios and tasks. This study introduces SDIGLM, an innovative LMM for structural damage identification, developed based on the open-source VisualGLM-6B architecture. To address the challenge of adapting LMMs to the intricate and varied operating conditions in CE, this work integrates a U-Net-based semantic segmentation module to generate defect segmentation maps as visual Chain of Thought(CoT). Additionally, a multi-round dialogue fine-tuning dataset is constructed to enhance logical reasoning, complemented by a language CoT formed through prompt engineering. By leveraging this multi-modal CoT, SDIGLM surpasses general-purpose LMMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types. Moreover, the model effectively describes damage characteristics such as hole size, crack direction, and corrosion severity.
现有的基于计算机视觉(CV)的结构损伤识别模型在分类和定位损伤方面表现出显著的准确性。然而,这些模型存在几个关键限制,阻碍了它们在土木工程(CE)中的实际应用。首先,它们识别不同类型损害的能力仍然受到限制,无法对现实世界中复杂多变的CE结构状况进行全面分析。其次,这些模型缺乏语言能力,因此无法通过自然语言描述来阐述结构损伤的特点。 随着人工智能技术的进步,大型多模态模型(LMMs)作为一种变革性解决方案应运而生,能够统一编码和对齐文本与视觉数据。这类模型可以自主生成详细的结构损伤描述,并在各种场景和任务中展示出强大的泛化能力。本研究介绍了SDIGLM——一种基于开源VisualGLM-6B架构开发的创新LMM,用于结构损伤识别。 为了应对将LMMs应用于复杂多变的CE操作条件中的挑战,这项工作整合了U-Net语义分割模块来生成缺陷分割图作为视觉Chain of Thought(CoT),并通过提示工程构建了一个通过多轮对话微调的数据集以增强逻辑推理,并形成语言CoT。借助这一多模态CoT,SDIGLM在结构损伤识别方面超越了一般的通用LMMs,在各种基础设施类型上实现了95.24%的准确率。此外,该模型还能够有效地描述诸如孔洞大小、裂缝方向和腐蚀严重程度等损害特性。
https://arxiv.org/abs/2504.11477
The evolution of conversational agents has been driven by the need for more contextually aware systems that can effectively manage dialogue over extended interactions. To address the limitations of existing models in capturing and utilizing long-term conversational history, we propose a novel framework that integrates Deep Canonical Correlation Analysis (DCCA) for discourse-level understanding. This framework learns discourse tokens to capture relationships between utterances and their surrounding context, enabling a better understanding of long-term dependencies. Experiments on the Ubuntu Dialogue Corpus demonstrate significant enhancement in response selection, based on the improved automatic evaluation metric scores. The results highlight the potential of DCCA in improving dialogue systems by allowing them to filter out irrelevant context and retain critical discourse information for more accurate response retrieval.
对话代理的进化是由对更具有上下文感知能力系统的需要所驱动的,这些系统能够有效地管理长时间互动中的对话。为了克服现有模型在捕捉和利用长期对话历史方面的局限性,我们提出了一种新的框架,该框架整合了深度典范相关分析(DCCA)来进行话语层面的理解。此框架通过学习话语标记来捕获言语与其周围上下文之间的关系,从而更好地理解长期依赖性。在Ubuntu对话语料库上的实验表明,在改进的自动评估指标得分的基础上,响应选择得到了显著增强。结果突显了DCCA在提高对话系统性能方面的潜力,使它们能够过滤掉不相关的背景信息并保留关键的话语信息,以实现更准确的回复检索。
https://arxiv.org/abs/2504.09094
Multiturn dialogue models aim to generate human-like responses by leveraging conversational context, consisting of utterances from previous exchanges. Existing methods often neglect the interactions between these utterances or treat all of them as equally significant. This paper introduces a discourse-aware framework for response selection in retrieval-based dialogue systems. The proposed model first encodes each utterance and response with contextual, positional, and syntactic features using Multi-view Canonical Correlation Analysis (MCCA). It then learns discourse tokens that capture relationships between an utterance and its surrounding turns in a shared subspace via Canonical Correlation Analysis (CCA). This two-step approach effectively integrates semantic and syntactic features to build discourse-level understanding. Experiments on the Ubuntu Dialogue Corpus demonstrate that our model achieves significant improvements in automatic evaluation metrics, highlighting its effectiveness in response selection.
多轮对话模型旨在通过利用先前交流中的话语上下文来生成类似人类的回复。现有的方法往往忽略了这些话语之间的相互作用,或者将它们视为同等重要。本文提出了一种基于对话感知框架的选择响应策略,适用于检索式对话系统。所提出的模型首先使用多视角典范相关分析(MCCA)编码每个话语和响应,利用上下文、位置和句法特征。然后,该模型通过典范相关分析(CCA),学习捕获话语与其周围回合之间关系的对话标记,在共享子空间中进行处理。这种两步方法有效地整合了语义和句法特性,以构建话语级别的理解。 在Ubuntu对话数据集上的实验表明,我们的模型在自动评估指标上取得了显著改进,突显了其在响应选择中的有效性。
https://arxiv.org/abs/2504.09073
Writing well requires not only expressing ideas but also refining them through revision, a process facilitated by reflection. Prior research suggests that feedback delivered through dialogues, such as those in writing center tutoring sessions, can help writers reflect more thoughtfully on their work compared to static feedback. Recent advancements in multi-modal large language models (LLMs) now offer new possibilities for supporting interactive and expressive voice-based reflection in writing. In particular, we propose that LLM-generated static feedback can be repurposed as conversation starters, allowing writers to seek clarification, request examples, and ask follow-up questions, thereby fostering deeper reflection on their writing. We argue that voice-based interaction can naturally facilitate this conversational exchange, encouraging writers' engagement with higher-order concerns, facilitating iterative refinement of their reflections, and reduce cognitive load compared to text-based interactions. To investigate these effects, we propose a formative study exploring how text vs. voice input influence writers' reflection and subsequent revisions. Findings from this study will inform the design of intelligent and interactive writing tools, offering insights into how voice-based interactions with LLM-powered conversational agents can support reflection and revision.
撰写优秀的文章不仅需要表达想法,还需要通过修订来完善这些想法,而这一过程可以通过反思来促进。先前的研究表明,通过对话形式提供的反馈(例如在写作中心辅导过程中)可以比静态反馈更有效地帮助作者对其作品进行深入思考。近年来,多模态大型语言模型 (LLM) 的发展为支持互动且富有表现力的语音反射式写作提供了新的可能性。具体而言,我们建议将由 LLM 生成的静态反馈重新定位为对话的开始点,允许写作者寻求澄清、请求示例并提出跟进问题,从而加深他们对其写作的反思。我们认为,基于语音的交互可以自然地促进这种对话交流,鼓励作家们积极参与高层次的关注事项,通过迭代的方式进行细化,并减少与文本交互相比的认知负荷。为了研究这些影响,我们提议进行一项形成性研究,探讨文本输入与语音输入对作者反思及其后续修订的影响差异。这项研究的结果将为智能互动写作工具的设计提供信息,揭示出基于语音的 LLM 动力对话代理如何支持反射和修订的过程。
https://arxiv.org/abs/2504.08687
Are we running out of learning signal? Predicting the next word in an existing text has turned out to be a powerful signal, at least at scale. But there are signs that we are running out of this resource. In recent months, interaction between learner and feedback-giver has come into focus, both for "alignment" (with a reward model judging the quality of instruction following attempts) and for improving "reasoning" (process- and outcome-based verifiers judging reasoning steps). In this paper, we explore to what extent synthetic interaction in what we call Dialogue Games -- goal-directed and rule-governed activities driven predominantly by verbal actions -- can provide a learning signal, and how this signal can be used. We introduce an environment for producing such interaction data (with the help of a Large Language Model as counterpart to the learner model), both offline and online. We investigate the effects of supervised fine-tuning on this data, as well as reinforcement learning setups such as DPO, and GRPO; showing that all of these approaches achieve some improvements in in-domain games, but only GRPO demonstrates the ability to generalise to out-of-domain games as well as retain competitive performance in reference-based tasks. We release the framework and the baseline training setups in the hope that this can foster research in this promising new direction.
我们是否正在失去学习信号?预测现有文本中的下一个单词被证明是一个强大的信号,至少在大规模的情况下是如此。但是有迹象表明,这种资源正在枯竭。最近几个月,学习者和反馈提供者之间的互动开始受到关注,这既包括“对齐”(通过奖励模型评判指令跟随尝试的质量),也包括改进“推理”(基于过程和结果的验证器评判推理步骤)。在这篇论文中,我们探讨了所谓的对话游戏中合成互动在多大程度上可以作为一种学习信号,并研究这种信号如何被利用。我们介绍了一个环境来生成这样的交互数据(借助大型语言模型作为与学习者模型相对应的一方),包括离线和在线方式。我们调查了在这种数据上的监督微调的效果,以及诸如DPO和GRPO之类的强化学习设置;结果显示所有这些方法在域内游戏中都取得了一些改进,但只有GRPO展示了推广到域外游戏并在基于参考的任务中保持竞争力的能力。我们希望发布该框架及基准训练设置能够促进在这个有前景的新方向上的研究工作。
https://arxiv.org/abs/2504.08590
There is a growing interest in assessing the personality traits of Large language models (LLMs). However, traditional personality assessments based on self-report questionnaires may fail to capture their true behavioral nuances due to inherent biases and meta-knowledge contamination. This paper introduces a novel multi-observer framework for LLM personality assessment that draws inspiration from informant-report methods in psychology. Instead of relying solely on self-assessments, our approach employs multiple observer agents configured with a specific relationship context (e.g., family, friend, or workplace) to simulate interactive scenarios with a subject LLM. These observers engage in dialogues and subsequently provide ratings across the Big Five personality dimensions. Our experiments reveal that LLMs possess systematic biases in self-report personality ratings. Moreover, aggregating observer ratings effectively reduces non-systematic biases and achieves optimal reliability with 5-7 observers. The findings highlight the significant impact of relationship context on personality perception and demonstrate that a multi-observer paradigm yields a more robust and context-sensitive evaluation of LLM personality traits.
人们对评估大型语言模型(LLM)的人格特质的兴趣日益增加。然而,基于自陈量表的传统人格测评方法可能由于固有的偏见和元知识污染而无法捕捉到其真实的行为细微差别。本文介绍了一种新颖的多观察者框架用于LLM的人格评估,该框架借鉴了心理学中信息报告法的思想。我们的方法不再依赖单一的自我评估,而是采用多个配置有特定关系背景(如家庭、朋友或工作场所)的观察者代理来与主体LLM进行互动模拟。这些观察者参与对话并随后在大五人格维度上给出评分。 实验结果显示,LLM在自陈人格量表上的评价中存在系统性偏见。此外,聚合观察者的评级可以有效减少非系统性偏差,并且当观察者数量达到5到7名时可实现最优的可靠性。研究结果强调了关系背景对人格感知的重要影响,并展示了多观察者范式能够提供一种更加稳健和情境敏感性的LLM人格特质评估方法。
https://arxiv.org/abs/2504.08399
Recently, a growing number of experts in artificial intelligence (AI) and medicine have be-gun to suggest that the use of AI systems, particularly machine learning (ML) systems, is likely to humanise the practice of medicine by substantially improving the quality of clinician-patient relationships. In this thesis, however, I argue that medical ML systems are more likely to negatively impact these relationships than to improve them. In particular, I argue that the use of medical ML systems is likely to comprise the quality of trust, care, empathy, understanding, and communication between clinicians and patients.
最近,越来越多的人工智能(AI)和医学领域的专家开始建议,使用人工智能系统,特别是机器学习(ML)系统,可能会通过显著提高医患关系的质量来使医疗实践更加人性化。然而,在这篇论文中,我主张医疗ML系统的使用更有可能对这些关系产生负面影响而不是改善它们。具体而言,我认为使用医疗ML系统很可能会损害医生与患者之间的信任、关怀、同理心、理解和沟通的质量。
https://arxiv.org/abs/2504.07763
Large language models (LLMs) demonstrate remarkable text comprehension and generation capabilities but often lack the ability to utilize up-to-date or domain-specific knowledge not included in their training data. To address this gap, we introduce KEDiT, an efficient method for fine-tuning LLMs for knowledge-grounded dialogue generation. KEDiT operates in two main phases: first, it employs an information bottleneck to compress retrieved knowledge into learnable parameters, retaining essential information while minimizing computational overhead. Second, a lightweight knowledge-aware adapter integrates these compressed knowledge vectors into the LLM during fine-tuning, updating less than 2\% of the model parameters. The experimental results on the Wizard of Wikipedia and a newly constructed PubMed-Dialog dataset demonstrate that KEDiT excels in generating contextually relevant and informative responses, outperforming competitive baselines in automatic, LLM-based, and human evaluations. This approach effectively combines the strengths of pretrained LLMs with the adaptability needed for incorporating dynamic knowledge, presenting a scalable solution for fields such as medicine.
大型语言模型(LLMs)表现出卓越的文本理解和生成能力,但通常缺乏利用训练数据中未包含的最新或特定领域知识的能力。为了弥补这一差距,我们提出了KEDiT,这是一种用于微调基于知识引导对话生成的大型语言模型的有效方法。KEDiT主要分为两个阶段运行:首先,它采用信息瓶颈技术将检索到的知识压缩为可学习参数,在保留关键信息的同时尽量减少计算开销;其次,一个轻量级的知识感知适配器在微调过程中将这些压缩后的知识向量集成到LLM中,并且只更新不到2%的模型参数。实验结果表明,KEDiT在Wizard of Wikipedia和新构建的PubMed-Dialog数据集上表现出色,能够生成上下文相关且富有信息性的响应,在自动评估、基于LLM的评估以及人工评估方面均超越了竞争基线方法。这种方法有效地结合了预训练LLMs的优势与适应性需求,用于融合动态知识,为医学等领域的应用提供了可扩展的解决方案。
https://arxiv.org/abs/2504.07754
In recent years, accurately and quickly deploying medical large language models (LLMs) has become a significant trend. Among these, retrieval-augmented generation (RAG) has garnered significant attention due to its features of rapid deployment and privacy protection. However, existing medical RAG frameworks still have shortcomings. Most existing medical RAG frameworks are designed for single-round question answering tasks and are not suitable for multi-round diagnostic dialogue. On the other hand, existing medical multi-round RAG frameworks do not consider the interconnections between potential diseases to inquire precisely like a doctor. To address these issues, we propose a Multi-Round Diagnostic RAG (MRD-RAG) framework that mimics the doctor's diagnostic process. This RAG framework can analyze diagnosis information of potential diseases and accurately conduct multi-round diagnosis like a doctor. To evaluate the effectiveness of our proposed frameworks, we conduct experiments on two modern medical datasets and two traditional Chinese medicine datasets, with evaluations by GPT and human doctors on different methods. The results indicate that our RAG framework can significantly enhance the diagnostic performance of LLMs, highlighting the potential of our approach in medical diagnosis. The code and data can be found in our project website this https URL.
近年来,准确且快速地部署医学大语言模型(LLM)已经成为一个重要趋势。其中,检索增强生成(RAG)因其快速部署和隐私保护的特性而备受关注。然而,现有的医学RAG框架仍然存在一些不足之处。大多数现有医疗RAG框架都是为单轮问答任务设计的,并不适合多轮诊断对话。另一方面,现有的医疗多轮RAG框架并未考虑到潜在疾病之间的关联性,无法像医生那样精确提问。为了应对这些问题,我们提出了一种模仿医生诊断过程的多轮诊断RAG(MRD-RAG)框架。该RAG框架能够分析潜在疾病的诊断信息,并准确地进行多轮诊断,就像真正的医生一样。 为了评估所提出的框架的有效性,我们在两个现代医学数据集和两个传统中医数据集上进行了实验,通过GPT模型和人类医生对不同方法的评价来进行效果检验。结果表明,我们的RAG框架能够显著提高LLM在医疗诊断中的性能,突显了我们这种方法在医疗诊断领域的潜力。 代码和数据可在我们的项目网站[此网址](请将“this https URL”替换为实际提供的链接)上找到。
https://arxiv.org/abs/2504.07724
Chat-oriented dialogue systems designed to provide tangible benefits, such as sharing the latest news or preventing frailty in senior citizens, often require Proactive acquisition of specific user Information via chats on user-faVOred Topics (PIVOT). This study proposes the PIVOT task, designed to advance the technical foundation for these systems. In this task, a system needs to acquire the answers of a user to predefined questions without making the user feel abrupt while engaging in a chat on a predefined topic. We found that even recent large language models (LLMs) show a low success rate in the PIVOT task. We constructed a dataset suitable for the analysis to develop more effective systems. Finally, we developed a simple but effective system for this task by incorporating insights obtained through the analysis of this dataset.
面向对话的对话系统,旨在通过分享最新新闻或防止老年人衰弱等方式提供实际利益时,通常需要通过聊天获取用户的特定信息(PIVOT),即在用户喜欢的话题上进行聊天以主动收集特定用户信息。本研究提出了PIVOT任务,旨在为这类系统的技术基础奠定坚实的基础。在这个任务中,系统需要在一个预定义的主题上与用户对话的同时,不使用户感到突兀地获取用户对预先设定的问题的回答。我们发现,即使是最近的大规模语言模型(LLMs)在完成PIVOT任务时的成功率也很低。为此,我们构建了一个适合该分析的数据库,以开发更有效的系统。最后,通过对该数据集进行分析所获得的见解,我们开发了一种简单但有效的方法来应对这一挑战。
https://arxiv.org/abs/2504.07698