Paper Reading AI Learner

CAGE: Circumplex Affect Guided Expression Inference

2024-04-23 12:30:17
Niklas Wagner, Felix Mätzler, Samed R. Vossberg, Helen Schneider, Svetlana Pavlitska, J. Marius Zöllner

Abstract

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: this https URL.

Abstract (translated)

理解情感和表达是一个跨越多个学科的任务,尤其是在提高用户体验方面。与普遍认识相反,已经证明情感并不是离散的实体,而是存在于一个连续的过程中。由于各种因素(包括文化背景、个人经历和认知偏见)的不同,人们对离散情感的理解存在差异。因此,大多数表达理解方法,尤其是那些依赖离散类别的,在本质上存在偏见。在本文中,我们对两个常见的数据集(AffectNet和EMOTIC)进行了比较深入的分析和评估,这些数据集配备了共轭模型的组件。此外,我们提出了一个专为轻量级应用设计的面部表情预测模型。通过基于小规模的MaxViT模型架构,我们在训练过程中使用连续的紧张和兴奋标签对离散表达类别标签的影响进行了评估。我们发现,在考虑紧张和兴奋标签的同时,使用离散类别标签可以显著提高表情推断。所提出的模型在AffectNet上优于现有状态,将其确立为推断紧张和兴奋的最佳模型,具有7%的较低MSE。训练脚本和训练权重以复制我们的结果可以从这里找到:https://www. this URL。

URL

https://arxiv.org/abs/2404.14975

PDF

https://arxiv.org/pdf/2404.14975.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot