Paper Reading AI Learner

Emotion Recognition from the perspective of Activity Recognition

2024-03-24 18:53:57
Savinay Nagendra, Prapti Panigrahi

Abstract

Applications of an efficient emotion recognition system can be found in several domains such as medicine, driver fatigue surveillance, social robotics, and human-computer interaction. Appraising human emotional states, behaviors, and reactions displayed in real-world settings can be accomplished using latent continuous dimensions. Continuous dimensional models of human affect, such as those based on valence and arousal are more accurate in describing a broad range of spontaneous everyday emotions than more traditional models of discrete stereotypical emotion categories (e.g. happiness, surprise). Most of the prior work on estimating valence and arousal considers laboratory settings and acted data. But, for emotion recognition systems to be deployed and integrated into real-world mobile and computing devices, we need to consider data collected in the world. Action recognition is a domain of Computer Vision that involves capturing complementary information on appearance from still frames and motion between frames. In this paper, we treat emotion recognition from the perspective of action recognition by exploring the application of deep learning architectures specifically designed for action recognition, for continuous affect recognition. We propose a novel three-stream end-to-end deep learning regression pipeline with an attention mechanism, which is an ensemble design based on sub-modules of multiple state-of-the-art action recognition systems. The pipeline constitutes a novel data pre-processing approach with a spatial self-attention mechanism to extract keyframes. The optical flow of high-attention regions of the face is extracted to capture temporal context. AFEW-VA in-the-wild dataset has been used to conduct comparative experiments. Quantitative analysis shows that the proposed model outperforms multiple standard baselines of both emotion recognition and action recognition models.

Abstract (translated)

高效情感识别系统的应用范围存在于医学、驾驶员疲劳监测、社会机器人学和人机交互等多个领域。评估现实场景中的人类情感状态、行为和反应可以使用潜在连续维度。基于愉悦和激情的连续维度模型比更传统的离散刻板情感分类模型更准确地描述广泛的日常自发性情感。在估计情感和情绪方面,大部分先前的研究都集中在实验室环境和已有的数据上。但是,为了将情感识别系统部署并集成到现实世界的移动和计算设备中,我们需要考虑从世界中收集的数据。动作识别是一个计算机视觉领域,涉及从静帧和帧之间的运动中捕捉互补信息。在本文中,我们从动作识别的角度来探讨应用专门为动作识别设计的深度学习架构,进行连续情感识别。我们提出了一个新颖的三流端到端深度学习回归管道,带有一个关注机制,这是基于多个最先进的动作识别系统的子模块的集成设计。该管道构成了一个新颖的数据预处理方法,具有空间自注意机制以提取关键帧。提取人脸高关注区域的光学流,以捕捉时间语境。AFEW-VA野外数据集已用于进行比较实验。定量分析表明,与情绪识别和动作识别模型的多个标准基线相比,所提出的模型具有优异的性能。

URL

https://arxiv.org/abs/2403.16263

PDF

https://arxiv.org/pdf/2403.16263.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot