Paper Reading AI Learner

FER-YOLO-Mamba: Facial Expression Detection and Classification Based on Selective State Space

2024-05-03 03:20:37
Hui Ma, Sen Lei, Turgay Celik, Heng-Chao Li

Abstract

Facial Expression Recognition (FER) plays a pivotal role in understanding human emotional cues. However, traditional FER methods based on visual information have some limitations, such as preprocessing, feature extraction, and multi-stage classification procedures. These not only increase computational complexity but also require a significant amount of computing resources. Considering Convolutional Neural Network (CNN)-based FER schemes frequently prove inadequate in identifying the deep, long-distance dependencies embedded within facial expression images, and the Transformer's inherent quadratic computational complexity, this paper presents the FER-YOLO-Mamba model, which integrates the principles of Mamba and YOLO technologies to facilitate efficient coordination in facial expression image recognition and localization. Within the FER-YOLO-Mamba model, we further devise a FER-YOLO-VSS dual-branch module, which combines the inherent strengths of convolutional layers in local feature extraction with the exceptional capability of State Space Models (SSMs) in revealing long-distance dependencies. To the best of our knowledge, this is the first Vision Mamba model designed for facial expression detection and classification. To evaluate the performance of the proposed FER-YOLO-Mamba model, we conducted experiments on two benchmark datasets, RAF-DB and SFEW. The experimental results indicate that the FER-YOLO-Mamba model achieved better results compared to other models. The code is available from this https URL.

Abstract (translated)

面部表情识别(FER)在理解人类情感线索方面起着关键作用。然而,基于视觉信息的传统FER方法存在一些限制,例如预处理、特征提取和多级分类过程。这些不仅增加了计算复杂度,而且需要大量的计算资源。考虑到基于卷积神经网络(CNN)的FER方案在识别面部表情图像中的深度、长距离依赖方面常常表现不足,以及Transformer的固有二次计算复杂度,本文提出了FER-YOLO-Mamba模型,该模型将Mamba和YOLO技术的原则整合起来,以促进面部表情图像识别和定位的高效协调。在FER-YOLO-Mamba模型中,我们进一步设计了一个FER-YOLO-VSS双分支模块,将局部特征提取的卷积层固有优势与State Space Models(SSM)在揭示长距离依赖的非凡能力相结合。据我们所知,这是第一个针对面部表情检测和分类设计的视觉Mamba模型。为了评估所提出的FER-YOLO-Mamba模型的性能,我们在两个基准数据集上进行了实验:RAF-DB和SFEW。实验结果表明,FER-YOLO-Mamba模型取得了比其他模型更好的效果。代码可以从该链接的URL中获取。

URL

https://arxiv.org/abs/2405.01828

PDF

https://arxiv.org/pdf/2405.01828.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot