Multi-Granularity Network with Modal Attention for Dense Affective Understanding

2021-06-18 07:37:06

Baoming Yan, Lin Wang, Ke Gao, Bo Gao, Xiao Liu, Chao Ban, Jiang Yang, Xiaobo Li

arXiv_CV

arXiv_CV Attention Salient Recommendation Relation Prediction Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Video affective understanding, which aims to predict the evoked expressions by the video content, is desired for video creation and recommendation. In the recent EEV challenge, a dense affective understanding task is proposed and requires frame-level affective prediction. In this paper, we propose a multi-granularity network with modal attention (MGN-MA), which employs multi-granularity features for better description of the target frame. Specifically, the multi-granularity features could be divided into frame-level, clips-level and video-level features, which corresponds to visual-salient content, semantic-context and video theme information. Then the modal attention fusion module is designed to fuse the multi-granularity features and emphasize more affection-relevant modals. Finally, the fused feature is fed into a Mixtures Of Experts (MOE) classifier to predict the expressions. Further employing model-ensemble post-processing, the proposed method achieves the correlation score of 0.02292 in the EEV challenge.

Abstract (translated)

URL

https://arxiv.org/abs/2106.09964

PDF

https://arxiv.org/pdf/2106.09964.pdf