Paper Reading AI Learner

Aff-Wild Database and AffWildNet

2019-10-11 17:24:45
Mengyao Liu, Dimitrios Kollias

Abstract

In the context of HCI, building an automatic system to recognize affect of human facial expression in real-world condition is very crucial to make machine interact naturallisticaly with a man. However, existing facial emotion databases usually contain expression in the limited scenario under well-controlled condition. Aff-Wild is currently the largest database consisting of spontaneous facial expression in the wild annotated with valence and arousal. The first contribution of this project is the completion of extending Aff-Wild database which is fulfilled by collecting videos from YouTube on which the videos have spontaneous facial expressions in the wild, annotating videos with valence and arousal ranging in [-1,1], detecting faces in frames using FFLD2 detector and partitioning the whole data set into train, validate and test set, with 527056, 94223 and 135145 frames. The diversity is guaranteed regarding age, ethnicity and values of valence and arousal. The ratio of male to female is close to 1. Regarding the techniques used to build the automatic system, deep learning is outstanding since almost all winning methods in emotion challenges adopt DNN techniques. The second contribution of this project is that an end-to-end DNN is constructed to have joint CNN and RNN block and gives the estimation on valence and arousal for each frame in sequential data. VGGFace, ResNet, DenseNet with the corresponding pre-trained model for CNN block and LSTM, GRU, IndRNN, Attention mechanism for RNN block are experimented aiming to find the best combination. Fine tuning and transfer learning techniques are also tried out. By comparing the CCC evaluation value on test data, the best model is found to be pre-trained VGGFace connected with 2 layers GRU with attention mechanism. The models test performance is 0.555 CCC for valence with sequence length 80 and 0.499 CCC for arousal with sequence length 70.

Abstract (translated)

URL

https://arxiv.org/abs/1910.05318

PDF

https://arxiv.org/pdf/1910.05318.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot