Abstract
Recognizing facial expressions is one of the central problems in computer vision. Temporal image sequences have useful spatio-temporal features for recognizing expressions. In this paper, we propose a new 3D Convolution Neural Network (CNN) that can be trained end-to-end for facial expression recognition on temporal image sequences without using facial landmarks. More specifically, a novel 3D convolutional layer that we call Local Binary Volume (LBV) layer is proposed. The LBV layer, when used with our newly proposed LBVCNN network, achieve comparable results compared to state-of-the-art landmark-based or without landmark-based models on image sequences from CK+, Oulu-CASIA, and UNBC McMaster shoulder pain datasets. Furthermore, our LBV layer reduces the number of trainable parameters by a significant amount when compared to a conventional 3D convolutional layer. As a matter of fact, when compared to a 3x3x3 conventional 3D convolutional layer, the LBV layer uses 27 times less trainable parameters.
Abstract (translated)
识别面部表情是计算机视觉的核心问题之一。时间图像序列具有识别表达式的有用时空特征。本文提出了一种新的三维卷积神经网络(CNN),它可以在不使用面部标志物的情况下,对时序图像序列进行端到端的面部表情识别训练。更具体地说,提出了一种新的称为局部二值体(LBV)层的三维卷积层。当LBV层与我们新提出的LBVCNN网络一起使用时,根据CK+、Oulu Casia和UNBC McMaster肩部疼痛数据集中的图像序列,与基于或不基于地标的最先进模型相比,可获得类似的结果。此外,与传统的三维卷积层相比,我们的LBV层可以显著减少可训练参数的数量。事实上,与3x3x3x3常规三维卷积层相比,LBV层使用的可训练参数少27倍。
URL
https://arxiv.org/abs/1904.07647