Paper Reading AI Learner

LBVCNN: Local Binary Volume Convolutional Neural Network for Facial Expression Recognition from Image Sequences

2019-04-16 13:19:59
Sudhakar Kumawat, Manisha Verma, Shanmuganathan Raman

Abstract

Recognizing facial expressions is one of the central problems in computer vision. Temporal image sequences have useful spatio-temporal features for recognizing expressions. In this paper, we propose a new 3D Convolution Neural Network (CNN) that can be trained end-to-end for facial expression recognition on temporal image sequences without using facial landmarks. More specifically, a novel 3D convolutional layer that we call Local Binary Volume (LBV) layer is proposed. The LBV layer, when used with our newly proposed LBVCNN network, achieve comparable results compared to state-of-the-art landmark-based or without landmark-based models on image sequences from CK+, Oulu-CASIA, and UNBC McMaster shoulder pain datasets. Furthermore, our LBV layer reduces the number of trainable parameters by a significant amount when compared to a conventional 3D convolutional layer. As a matter of fact, when compared to a 3x3x3 conventional 3D convolutional layer, the LBV layer uses 27 times less trainable parameters.

Abstract (translated)

识别面部表情是计算机视觉的核心问题之一。时间图像序列具有识别表达式的有用时空特征。本文提出了一种新的三维卷积神经网络(CNN),它可以在不使用面部标志物的情况下,对时序图像序列进行端到端的面部表情识别训练。更具体地说,提出了一种新的称为局部二值体(LBV)层的三维卷积层。当LBV层与我们新提出的LBVCNN网络一起使用时,根据CK+、Oulu Casia和UNBC McMaster肩部疼痛数据集中的图像序列,与基于或不基于地标的最先进模型相比,可获得类似的结果。此外,与传统的三维卷积层相比,我们的LBV层可以显著减少可训练参数的数量。事实上,与3x3x3x3常规三维卷积层相比,LBV层使用的可训练参数少27倍。

URL

https://arxiv.org/abs/1904.07647

PDF

https://arxiv.org/pdf/1904.07647.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot