Abstract
Multimodal emotion recognition in conversation (MERC) has garnered substantial research attention recently. Existing MERC methods face several challenges: (1) they fail to fully harness direct inter-modal cues, possibly leading to less-than-thorough cross-modal modeling; (2) they concurrently extract information from the same and different modalities at each network layer, potentially triggering conflicts from the fusion of multi-source data; (3) they lack the agility required to detect dynamic sentimental changes, perhaps resulting in inaccurate classification of utterances with abrupt sentiment shifts. To address these issues, a novel approach named GraphSmile is proposed for tracking intricate emotional cues in multimodal dialogues. GraphSmile comprises two key components, i.e., GSF and SDP modules. GSF ingeniously leverages graph structures to alternately assimilate inter-modal and intra-modal emotional dependencies layer by layer, adequately capturing cross-modal cues while effectively circumventing fusion conflicts. SDP is an auxiliary task to explicitly delineate the sentiment dynamics between utterances, promoting the model's ability to distinguish sentimental discrepancies. Furthermore, GraphSmile is effortlessly applied to multimodal sentiment analysis in conversation (MSAC), forging a unified multimodal affective model capable of executing MERC and MSAC tasks. Empirical results on multiple benchmarks demonstrate that GraphSmile can handle complex emotional and sentimental patterns, significantly outperforming baseline models.
Abstract (translated)
多模态情感识别在对话中(MERC)最近引起了大量研究关注。现有的MERC方法面临几个挑战:(1)它们无法充分利用直接跨模态线索,可能导致跨模态建模不够充分;(2)它们在每层网络层上同时提取相同和不同模态的信息,可能导致来自多源数据融合的冲突;(3)它们缺乏检测动态情感变化所需的灵活性,可能导致对具有突然情感变化的话语进行不准确的分类。为了应对这些问题,一种名为GraphSmile的新方法提出了跟踪复杂情感线索在多模态对话中的新方法。GraphSmile包括两个关键组件,即GSF和SDP模块。GSF巧妙地利用图结构通过逐层交替吸收跨模态和内模态情感依赖,充分捕捉跨模态线索,同时有效避免融合冲突。SDP是一个辅助任务,用于明确划分句子之间的情感动态,促进模型能够区分情感差异。此外,GraphSmile轻松应用于多模态情感分析在对话(MSAC),构建了一个能够执行MERC和MSAC任务的统一多模态情感模型。在多个基准测试上的实验结果表明,GraphSmile可以处理复杂的情感和情感模式,显著优于基线模型。
URL
https://arxiv.org/abs/2407.21536