Abstract
Misinformation, i.e. factually incorrect information, is often conveyed in multiple modalities, e.g. an image accompanied by a caption. It is perceived as more credible by humans, and spreads faster and wider than its text-only counterparts. While an increasing body of research investigates automated fact-checking (AFC), previous surveys mostly focus on textual misinformation. In this survey, we conceptualise a framework for AFC including subtasks unique to multimodal misinformation. Furthermore, we discuss related terminological developed in different communities in the context of our framework. We focus on four modalities prevalent in real-world fact-checking: text, image, audio, and video. We survey benchmarks and models, and discuss limitations and promising directions for future research.
Abstract (translated)
虚假信息(即事实不正确的信息)往往以多种方式传递,例如伴随一张标题图片。人类通常认为图片加标题的信息更可信,其传播速度比仅文本的信息更快更广泛。虽然越来越多的研究在研究自动化事实检查(AFC)方面展开,但以前的调查主要关注文本型虚假信息。在本研究中,我们提出了一个包括 multimodal 虚假信息独特任务的AFC框架。此外,我们在我们的框架上下文中讨论了不同社区中发展出来的相关术语。我们关注现实世界中普遍存在的四个modality:文本、图像、音频和视频。我们调查了基准和模型,并讨论了未来研究的限制和前景。
URL
https://arxiv.org/abs/2305.13507