Paper Reading AI Learner

A Multimodal Sentiment Dataset for Video Recommendation

2021-09-17 03:10:42
Hongxuan Tang, Hao Liu, Xinyan Xiao, Hua Wu

Abstract

Recently, multimodal sentiment analysis has seen remarkable advance and a lot of datasets are proposed for its development. In general, current multimodal sentiment analysis datasets usually follow the traditional system of sentiment/emotion, such as positive, negative and so on. However, when applied in the scenario of video recommendation, the traditional sentiment/emotion system is hard to be leveraged to represent different contents of videos in the perspective of visual senses and language understanding. Based on this, we propose a multimodal sentiment analysis dataset, named baiDu Video Sentiment dataset (DuVideoSenti), and introduce a new sentiment system which is designed to describe the sentimental style of a video on recommendation scenery. Specifically, DuVideoSenti consists of 5,630 videos which displayed on Baidu, each video is manually annotated with a sentimental style label which describes the user's real feeling of a video. Furthermore, we propose UNIMO as our baseline for DuVideoSenti. Experimental results show that DuVideoSenti brings new challenges to multimodal sentiment analysis, and could be used as a new benchmark for evaluating approaches designed for video understanding and multimodal fusion. We also expect our proposed DuVideoSenti could further improve the development of multimodal sentiment analysis and its application to video recommendations.

Abstract (translated)

URL

https://arxiv.org/abs/2109.08333

PDF

https://arxiv.org/pdf/2109.08333.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot