Abstract
Multimodal sentiment analysis (MSA) aims to understand human sentiment through multimodal data. Most MSA efforts are based on the assumption of modality completeness. However, in real-world applications, some practical factors cause uncertain modality missingness, which drastically degrades the model's performance. To this end, we propose a Correlation-decoupled Knowledge Distillation (CorrKD) framework for the MSA task under uncertain missing modalities. Specifically, we present a sample-level contrastive distillation mechanism that transfers comprehensive knowledge containing cross-sample correlations to reconstruct missing semantics. Moreover, a category-guided prototype distillation mechanism is introduced to capture cross-category correlations using category prototypes to align feature distributions and generate favorable joint representations. Eventually, we design a response-disentangled consistency distillation strategy to optimize the sentiment decision boundaries of the student network through response disentanglement and mutual information maximization. Comprehensive experiments on three datasets indicate that our framework can achieve favorable improvements compared with several baselines.
Abstract (translated)
多模态情感分析(MSA)旨在通过多模态数据理解人类情感。大多数MSA努力都是基于模态完备性的假设。然而,在现实应用中,一些实际因素导致不确定模态缺失,这严重削弱了模型的性能。为此,我们提出了一个在不确定缺失模态下的MSA任务的联合关系蒸馏(CorrKD)框架。具体来说,我们提出了一个样本级别的对比性蒸馏机制,用于将包含跨样本相关性的全面知识转移到重建缺失语义。此外,还引入了一个分类引导的原型蒸馏机制,通过分类原型来捕捉跨类相关性,从而使特征分布对齐,并生成有利的联合表示。最后,我们设计了一个响应解耦一致性蒸馏策略,通过响应解耦和互信息最大化来优化学生网络的情感决策边界。在三个数据集上的全面实验表明,与几个基线相比,我们的框架可以实现显著的改进。
URL
https://arxiv.org/abs/2404.16456