Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding

2022-03-11 01:45:33

Yidan Sun, Qin Chao, Boyang Li

arXiv_CL

arXiv_CL Zero-Shot

Abstract
Abstract (translated)
URL
PDF

Abstract

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives(SyMoN), containing 5,193 video summaries of popular movies and TV series. SyMoN captures naturalistic storytelling videos for human audience made by human creators, and has higher story coverage and more frequent mental-state references than similar video-language story datasets. Differing from most existing video-text datasets, SyMoN features large semantic gaps between the visual and the textual modalities due to the prevalence of reporting bias and mental state descriptions. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.

Abstract (translated)

URL

https://arxiv.org/abs/2203.05711

PDF

https://arxiv.org/pdf/2203.05711.pdf