Common Ground Tracking in Multimodal Dialogue

2024-03-26 00:25:01
Ibrahim Khebour, Kenneth Lai, Mariah Bradford, Yifan Zhu, Richard Brutti, Christopher Tam, Jingxuan Tu, Benjamin Ibarra, Nathaniel Blanchard, Nikhil Krishnaswamy, James Pustejovsky


Within Dialogue Modeling research in AI and NLP, considerable attention has been spent on ``dialogue state tracking'' (DST), which is the ability to update the representations of the speaker's needs at each turn in the dialogue by taking into account the past dialogue moves and history. Less studied but just as important to dialogue modeling, however, is ``common ground tracking'' (CGT), which identifies the shared belief space held by all of the participants in a task-oriented dialogue: the task-relevant propositions all participants accept as true. In this paper we present a method for automatically identifying the current set of shared beliefs and ``questions under discussion'' (QUDs) of a group with a shared goal. We annotate a dataset of multimodal interactions in a shared physical space with speech transcriptions, prosodic features, gestures, actions, and facets of collaboration, and operationalize these features for use in a deep neural model to predict moves toward construction of common ground. Model outputs cascade into a set of formal closure rules derived from situated evidence and belief axioms and update operations. We empirically assess the contribution of each feature type toward successful construction of common ground relative to ground truth, establishing a benchmark in this novel, challenging task.

Abstract (translated)

在人工智能和自然语言处理领域的对话建模研究中,大量精力已经投入到“对话状态跟踪”(DST)上,这是指在每个回合中根据过去对话步骤和历史信息更新说话者需求的代表。然而,在对话建模中,较少被研究但同样重要的是“共同信念跟踪”(CGT),它是指所有参与者共享的信念空间:所有参与者都接受的与任务相关的命题。在本文中,我们提出了一个自动识别具有共同目标的一组群体当前的信念和“讨论中提出的问题”(QUDs)的方法。我们在共享物理空间的多媒体交互数据集上标注了语音转录、 prosodic 特征、手势、动作和协作的方面,并对其进行操作,以便在深度神经模型中预测向共同目标的构建方向。模型输出通过从情境证据和信念公理导出的形式封闭规则进行级联。我们通过比较每个特征类型对成功构建共同目标的贡献与真实情况,建立了基于这种新颖、具有挑战性的任务的基准。



