Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

Abstract
Abstract (translated)
URL
PDF

Abstract

Multi-modal Emotion Recognition in Conversation (MERC) has received considerable attention in various fields, e.g., human-computer interaction and recommendation systems. Most existing works perform feature disentanglement and fusion to extract emotional contextual information from multi-modal features and emotion classification. After revisiting the characteristic of MERC, we argue that long-range contextual semantic information should be extracted in the feature disentanglement stage and the inter-modal semantic information consistency should be maximized in the feature fusion stage. Inspired by recent State Space Models (SSMs), Mamba can efficiently model long-distance dependencies. Therefore, in this work, we fully consider the above insights to further improve the performance of MERC. Specifically, on the one hand, in the feature disentanglement stage, we propose a Broad Mamba, which does not rely on a self-attention mechanism for sequence modeling, but uses state space models to compress emotional representation, and utilizes broad learning systems to explore the potential data distribution in broad space. Different from previous SSMs, we design a bidirectional SSM convolution to extract global context information. On the other hand, we design a multi-modal fusion strategy based on probability guidance to maximize the consistency of information between modalities. Experimental results show that the proposed method can overcome the computational and memory limitations of Transformer when modeling long-distance contexts, and has great potential to become a next-generation general architecture in MERC.

Abstract (translated)

多模态情感识别（MERC）在诸如人机交互和推荐系统等领域得到了广泛关注。大多数现有作品通过特征解离和融合来提取多模态特征和情感分类所需的情感上下文信息。回顾MERC的特点后，我们提出，在特征解离阶段应该提取长距离上下文语义信息，而在特征融合阶段应该最大化跨模态语义信息的一致性。受到最近的状态空间模型（SSMs）的启发，Mamba可以有效地建模长距离依赖关系。因此，在本文中，我们完全考虑了上述见解，以进一步改进MERC的性能。具体来说，在特征解离阶段，我们提出了一种Broad Mamba，它不依赖于序列建模的自注意力机制，而是使用状态空间模型压缩情感表示，并利用Broad学习系统在宽空间中探索潜在数据分布。与以前的SSMs不同，我们设计了一种双向SSM卷积以提取全局上下文信息。另一方面，我们设计了一种基于概率指导的多模态融合策略，以最大化模态之间的信息一致性。实验结果表明，与Transformer模型相比，所提出的方法在建模长距离上下文时可以克服计算和内存限制，具有很大的潜力成为MERC的下一代通用架构。

URL

https://arxiv.org/abs/2404.17858

PDF

https://arxiv.org/pdf/2404.17858.pdf

Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

Abstract

Abstract (translated)

URL

PDF Copy

PDF