Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

Abstract
Abstract (translated)
URL
PDF

Abstract

In the Massive Open Online Courses (MOOC) learning scenario, the semantic information of instructional videos has a crucial impact on learners' emotional state. Learners mainly acquire knowledge by watching instructional videos, and the semantic information in the videos directly affects learners' emotional states. However, few studies have paid attention to the potential influence of the semantic information of instructional videos on learners' emotional states. To deeply explore the impact of video semantic information on learners' emotions, this paper innovatively proposes a multimodal emotion recognition method by fusing video semantic information and physiological signals. We generate video descriptions through a pre-trained large language model (LLM) to obtain high-level semantic information about instructional videos. Using the cross-attention mechanism for modal interaction, the semantic information is fused with the eye movement and PhotoPlethysmoGraphy (PPG) signals to obtain the features containing the critical information of the three modes. The accurate recognition of learners' emotional states is realized through the emotion classifier. The experimental results show that our method has significantly improved emotion recognition performance, providing a new perspective and efficient method for emotion recognition research in MOOC learning scenarios. The method proposed in this paper not only contributes to a deeper understanding of the impact of instructional videos on learners' emotional states but also provides a beneficial reference for future research on emotion recognition in MOOC learning scenarios.

Abstract (translated)

在大规模开放在线课程（MOOC）学习场景中，教学视频的语义信息对学习者的情感状态具有重要影响。学习者主要通过观看教学视频来获取知识，视频中的语义信息会直接影响学习者的情感状态。然而，迄今为止，几乎没有研究关注过教学视频语义信息对学习者情感状态的潜在影响。为了深入探索视频语义信息对学习者情感的影响，本文创新性地提出了一种将视频语义信息和生理信号融合的多模态情感识别方法。我们通过预训练的大语言模型（LLM）生成视频描述，以获得关于教学视频的高级语义信息。通过模态交互的跨注意机制，将语义信息与眼动和Plethysmography（PPG）信号融合，以获得包含三种模式关键信息的特征。通过情感分类器的准确识别，学习者的情感状态得到了准确识别。实验结果表明，我们的方法在情感识别性能上显著提高，为MOOC学习场景中情感识别的研究提供了新的视角和高效方法。本文提出的方法不仅为深入理解教学视频对学习者情感状态的影响提供了重要的参考，而且为未来在MOOC学习场景中情感识别的研究提供了有益的参考。

URL

https://arxiv.org/abs/2404.07484

PDF

https://arxiv.org/pdf/2404.07484.pdf

Multimodal Emotion Recognition by Fusing Video Semantic in MOOC Learning Scenarios

Abstract

Abstract (translated)

URL

PDF Copy

PDF