Paper Reading AI Learner

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

2024-04-11 05:44:27
Yuan Zhang, Xiaomei Tao, Hanxu Ai, Tao Chen, Yanling Gan

Abstract

In the Massive Open Online Courses (MOOC) learning scenario, the semantic information of instructional videos has a crucial impact on learners' emotional state. Learners mainly acquire knowledge by watching instructional videos, and the semantic information in the videos directly affects learners' emotional states. However, few studies have paid attention to the potential influence of the semantic information of instructional videos on learners' emotional states. To deeply explore the impact of video semantic information on learners' emotions, this paper innovatively proposes a multimodal emotion recognition method by fusing video semantic information and physiological signals. We generate video descriptions through a pre-trained large language model (LLM) to obtain high-level semantic information about instructional videos. Using the cross-attention mechanism for modal interaction, the semantic information is fused with the eye movement and PhotoPlethysmoGraphy (PPG) signals to obtain the features containing the critical information of the three modes. The accurate recognition of learners' emotional states is realized through the emotion classifier. The experimental results show that our method has significantly improved emotion recognition performance, providing a new perspective and efficient method for emotion recognition research in MOOC learning scenarios. The method proposed in this paper not only contributes to a deeper understanding of the impact of instructional videos on learners' emotional states but also provides a beneficial reference for future research on emotion recognition in MOOC learning scenarios.

Abstract (translated)

在大规模开放在线课程(MOOC)学习场景中,教学视频的语义信息对学习者的情感状态具有重要影响。学习者主要通过观看教学视频来获取知识,视频中的语义信息会直接影响学习者的情感状态。然而,迄今为止,几乎没有研究关注过教学视频语义信息对学习者情感状态的潜在影响。为了深入探索视频语义信息对学习者情感的影响,本文创新性地提出了一种将视频语义信息和生理信号融合的多模态情感识别方法。我们通过预训练的大语言模型(LLM)生成视频描述,以获得关于教学视频的高级语义信息。通过模态交互的跨注意机制,将语义信息与眼动和Plethysmography(PPG)信号融合,以获得包含三种模式关键信息的特征。通过情感分类器的准确识别,学习者的情感状态得到了准确识别。实验结果表明,我们的方法在情感识别性能上显著提高,为MOOC学习场景中情感识别的研究提供了新的视角和高效方法。本文提出的方法不仅为深入理解教学视频对学习者情感状态的影响提供了重要的参考,而且为未来在MOOC学习场景中情感识别的研究提供了有益的参考。

URL

https://arxiv.org/abs/2404.07484

PDF

https://arxiv.org/pdf/2404.07484.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot