Paper Reading AI Learner

Trustworthy Multimodal Fusion for Sentiment Analysis in Ordinal Sentiment Space

2024-04-13 08:15:57
Zhuyang Xie, Yan Yang, Jie Wang, Xiaorong Liu, Xiaofan Li

Abstract

Multimodal video sentiment analysis aims to integrate multiple modal information to analyze the opinions and attitudes of speakers. Most previous work focuses on exploring the semantic interactions of intra- and inter-modality. However, these works ignore the reliability of multimodality, i.e., modalities tend to contain noise, semantic ambiguity, missing modalities, etc. In addition, previous multimodal approaches treat different modalities equally, largely ignoring their different contributions. Furthermore, existing multimodal sentiment analysis methods directly regress sentiment scores without considering ordinal relationships within sentiment categories, with limited performance. To address the aforementioned problems, we propose a trustworthy multimodal sentiment ordinal network (TMSON) to improve performance in sentiment analysis. Specifically, we first devise a unimodal feature extractor for each modality to obtain modality-specific features. Then, an uncertainty distribution estimation network is customized, which estimates the unimodal uncertainty distributions. Next, Bayesian fusion is performed on the learned unimodal distributions to obtain multimodal distributions for sentiment prediction. Finally, an ordinal-aware sentiment space is constructed, where ordinal regression is used to constrain the multimodal distributions. Our proposed TMSON outperforms baselines on multimodal sentiment analysis tasks, and empirical results demonstrate that TMSON is capable of reducing uncertainty to obtain more robust predictions.

Abstract (translated)

多模态视频情感分析旨在将多个模态信息集成到一起,以分析发言者的观点和态度。大多数先前的研究都集中在探索内模态和外模态之间的语义交互。然而,这些工作忽略了多模态的可靠性,即模态通常包含噪声、语义模糊、缺失模态等。此外,之前的多模态方法没有将不同的模态同等对待,很大程度上忽略了它们的不同贡献。此外,现有的多模态情感分析方法在处理情感类别内的顺序关系时直接回归情感分数,性能有限。为解决上述问题,我们提出了一个可信赖的多模态情感顺序网络(TMSON),以提高情感分析的性能。具体来说,我们首先为每个模态设计了一个单模态特征提取器,以获得模态特有的特征。然后,自定义了不确定性分布估计网络,估计单模态不确定性分布。接下来,在学习的单模态分布上进行贝叶斯融合,以获得多模态分布用于情感预测。最后,我们构建了一个ordinal-aware情感空间,使用顺序回归约束多模态分布。我们的TMSON在多模态情感分析任务中优于基线,而实证结果表明,TMSON具有减少不确定性的能力,从而获得更稳健的预测结果。

URL

https://arxiv.org/abs/2404.08923

PDF

https://arxiv.org/pdf/2404.08923.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot