Paper Reading AI Learner

Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation

2025-08-07 10:23:14
Haonan Shangguan, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Ge Yu

Abstract

The surge in rich multimodal content on social media platforms has greatly advanced Multimodal Sentiment Analysis (MSA), with Large Language Models (LLMs) further accelerating progress in this field. Current approaches primarily leverage the knowledge and reasoning capabilities of parameter-heavy (Multimodal) LLMs for sentiment classification, overlooking autonomous multimodal sentiment reasoning generation in resource-constrained environments. Therefore, we focus on the Resource-Limited Joint Multimodal Sentiment Reasoning and Classification task, JMSRC, which simultaneously performs multimodal sentiment reasoning chain generation and sentiment classification only with a lightweight model. We propose a Multimodal Chain-of-Thought Reasoning Distillation model, MulCoT-RD, designed for JMSRC that employs a "Teacher-Assistant-Student" distillation paradigm to address deployment constraints in resource-limited environments. We first leverage a high-performance Multimodal Large Language Model (MLLM) to generate the initial reasoning dataset and train a medium-sized assistant model with a multi-task learning mechanism. A lightweight student model is jointly trained to perform efficient multimodal sentiment reasoning generation and classification. Extensive experiments on four datasets demonstrate that MulCoT-RD with only 3B parameters achieves strong performance on JMSRC, while exhibiting robust generalization and enhanced interpretability.

Abstract (translated)

社交媒体平台上丰富多模态内容的激增极大地推动了多模态情感分析(Multimodal Sentiment Analysis,MSA)的发展。大型语言模型(Large Language Models,LLMs)进一步加速了这一领域的进步。然而,当前的方法主要依赖于参数量大的(多模态)LLM进行情感分类,而忽视了在资源受限环境中自主生成的多模态情感推理。因此,我们专注于“资源限制下的联合多模态情感推理与分类”任务,简称JMSRC,该任务仅使用轻量级模型同时执行多模态情感推理链生成和情感分类。 为此,我们提出了一种名为Multimodal Chain-of-Thought Reasoning Distillation(MulCoT-RD)的模型,专门针对JMSRC任务设计。此模型采用“教师-助理-学生”蒸馏范式,以解决资源受限环境中的部署约束问题。首先利用高性能多模态大型语言模型(Multimodal Large Language Model, MLLM)生成初始推理数据集,并通过多任务学习机制训练一个中等规模的辅助模型。然后共同训练一个轻量级学生模型,用于执行高效的多模态情感推理生成和分类。 在四个数据集上的广泛实验表明,仅拥有30亿参数的MulCoT-RD在JMSRC任务上表现出色,并展示了强大的泛化能力和增强的可解释性。

URL

https://arxiv.org/abs/2508.05234

PDF

https://arxiv.org/pdf/2508.05234.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot