Abstract
Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim verification. In addition, to make the reasoning process transparent, a textual explanation is necessary to justify the verification result. However, most claim verification works mainly focus on the reasoning over textual evidence only or ignore the explainability, resulting in inaccurate and unconvincing verification. To address this problem, we propose a novel model that jointly achieves evidence retrieval, multi-modal claim verification, and explanation generation. For evidence retrieval, we construct a two-layer multi-modal graph for claims and evidence, where we design image-to-text and text-to-image reasoning for multi-modal retrieval. For claim verification, we propose token- and evidence-level fusion to integrate claim and evidence embeddings for multi-modal verification. For explanation generation, we introduce multi-modal Fusion-in-Decoder for explainability. Finally, since almost all the datasets are in general domain, we create a scientific dataset, AIChartClaim, in AI domain to complement claim verification community. Experiments show the strength of our model.
Abstract (translated)
验证声明的真实性通常需要对文本和视觉证据进行多模态推理,例如分析文字描述和图表图像以验证声明。此外,为了使推理过程透明化,还需要通过文本解释来证明验证结果的合理性。然而,大多数声明验证工作主要集中在仅基于文本证据的推理上,或者忽略了可解释性,这导致了验证结果的不准确和缺乏说服力。为了解决这一问题,我们提出了一种新型模型,该模型可以同时实现证据检索、多模态声明验证以及生成解释。 在证据检索方面,我们构建了一个双层的多模态图来关联声明与证据,其中设计了图像到文本和文本到图像的推理机制来进行多模态检索。对于声明验证,我们提出了令牌级和证据级融合方法,以整合声明和证据的嵌入特征进行多模态验证。在解释生成方面,我们引入了Decoder中的多模态Fusion-in-Decoder来增强可解释性。 最后,鉴于几乎所有数据集都属于通用领域,我们在人工智能领域创建了一个新的科学数据集AIChartClaim,以此来补充和完善声明验证社区的需求。实验结果展示了我们的模型的优势和有效性。
URL
https://arxiv.org/abs/2602.10023