Abstract
In this work, we construct and release a multi-domain and multi-modality event dataset (MMED), containing 25,165 textual news articles collected from hundreds of news media sites (e.g., Yahoo News, Google News, CNN News.) and 76,516 image posts shared on Flickr social media, which are annotated according to 412 real-world events. The dataset is collected to explore the problem of organizing heterogeneous data contributed by professionals and amateurs in different data domains, and the problem of transferring event knowledge obtained from one data domain to heterogeneous data domain, thus summarizing the data with different contributors. We hope that the release of the MMED dataset can stimulate innovate research on related challenging problems, such as event discovery, cross-modal (event) retrieval, and visual question answering, etc.
Abstract (translated)
在这项工作中,我们构建并发布了一个多域和多模式事件数据集(MMED),其中包含从数百个新闻媒体网站(如雅虎新闻、谷歌新闻、CNN新闻)收集的25165篇文本新闻文章,以及在Flickr社交媒体上共享的76516篇图片文章,这些文章根据412个现实世界事件进行注释。收集数据集,探讨不同数据域的专业人员和业余人员贡献的异构数据的组织问题,以及将从一个数据域获得的事件知识转移到异构数据域的问题,从而对不同贡献者的数据进行汇总。我们希望MMED数据集的发布能够激发对相关挑战性问题的创新研究,如事件发现、跨模式(事件)检索和可视问答等。
URL
https://arxiv.org/abs/1904.02354