Abstract
Accurately modeling affect dynamics, which refers to the changes and fluctuations in emotions and affective displays during human conversations, is crucial for understanding human interactions. By analyzing affect dynamics, we can gain insights into how people communicate, respond to different situations, and form relationships. However, modeling affect dynamics is challenging due to contextual factors, such as the complex and nuanced nature of interpersonal relationships, the situation, and other factors that influence affective displays. To address this challenge, we propose a Cross-person Memory Transformer (CPM-T) framework which is able to explicitly model affective dynamics (intrapersonal and interpersonal influences) by identifying verbal and non-verbal cues, and with a large language model to utilize the pre-trained knowledge and perform verbal reasoning. The CPM-T framework maintains memory modules to store and update the contexts within the conversation window, enabling the model to capture dependencies between earlier and later parts of a conversation. Additionally, our framework employs cross-modal attention to effectively align information from multi-modalities and leverage cross-person attention to align behaviors in multi-party interactions. We evaluate the effectiveness and generalizability of our approach on three publicly available datasets for joint engagement, rapport, and human beliefs prediction tasks. Remarkably, the CPM-T framework outperforms baseline models in average F1-scores by up to 7.3%, 9.3%, and 2.0% respectively. Finally, we demonstrate the importance of each component in the framework via ablation studies with respect to multimodal temporal behavior.
Abstract (translated)
准确地建模情感动态,是指在人类对话中情感和表现的变化和波动,对于理解人类互动至关重要。通过分析情感动态,我们可以获取人们对如何沟通、如何应对不同情况、以及形成关系的见解。然而,建模情感动态因为上下文因素(例如人际关系的复杂性和细微差别、情况、以及其他影响情感表现的因素)而具有挑战性。为了应对这一挑战,我们提出了 Cross-Person Memory Transformer (CPM-T)框架,该框架能够明确建模情感动态(个人间影响),通过识别语言和非语言信号,并使用大型语言模型利用预训练的知识进行语言推理。CPM-T框架维护记忆模块,在对话窗口内存储和更新上下文,使模型能够捕捉对话前端和后端之间的依赖关系。此外,我们的框架使用跨模态注意力有效地对齐多模态信息,并利用跨个人注意力在多主体交互中对齐行为。我们评估了我们的方法和三个公开数据集(联合参与、关系建立和人类信念预测任务)的有效性和泛化性。令人惊讶地,CPM-T框架在平均F1得分上比基准模型高出7.3%、9.3%和2.0%。最后,我们通过删除部分框架成分的研究来展示每个组件在框架中的重要性,以多模态时间行为的角度。
URL
https://arxiv.org/abs/2305.12369