HEU Emotion: A Large-scale Database for Multi-modal Emotion Recognition in the Wild

2020-07-24 13:36:52

Jing Chen (1), Chenhui Wang (2), Kejun Wang (1), Chaoqun Yin (1), Cong Zhao (1), Tao Xu (1), Xinyi Zhang (1), Ziqiang Huang (1), Meichen Liu (1), Tao Yang (1) ((1) College of Intelligent Systems Science and Engineering, Harbin Engineering University, Harbin, China., (2) UCLA Department of Statistics, Los Angeles, CA.)

arXiv_CV

arXiv_CV Recognition Deep_Learning Attention Pose Emotion Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

The study of affective computing in the wild setting is underpinned by databases. Existing multimodal emotion databases in the real-world conditions are few and small, with a limited number of subjects and expressed in a single language. To meet this requirement, we collected, annotated, and prepared to release a new natural state video database (called HEU Emotion). HEU Emotion contains a total of 19,004 video clips, which is divided into two parts according to the data source. The first part contains videos downloaded from Tumblr, Google, and Giphy, including 10 emotions and two modalities (facial expression and body posture). The second part includes corpus taken manually from movies, TV series, and variety shows, consisting of 10 emotions and three modalities (facial expression, body posture, and emotional speech). HEU Emotion is by far the most extensive multi-modal emotional database with 9,951 subjects. In order to provide a benchmark for emotion recognition, we used many conventional machine learning and deep learning methods to evaluate HEU Emotion. We proposed a Multi-modal Attention module to fuse multi-modal features adaptively. After multi-modal fusion, the recognition accuracies for the two parts increased by 2.19% and 4.01% respectively over those of single-modal facial expression recognition.

Abstract (translated)

URL

https://arxiv.org/abs/2007.12519

PDF

https://arxiv.org/pdf/2007.12519.pdf