Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?

Abstract
Abstract (translated)
URL
PDF

Abstract

Memes can sway people's opinions over social media as they combine visual and textual information in an easy-to-consume manner. Since memes instantly turn viral, it becomes crucial to infer their intent and potentially associated harmfulness to take timely measures as needed. A common problem associated with meme comprehension lies in detecting the entities referenced and characterizing the role of each of these entities. Here, we aim to understand whether the meme glorifies, vilifies, or victimizes each entity it refers to. To this end, we address the task of role identification of entities in harmful memes, i.e., detecting who is the 'hero', the 'villain', and the 'victim' in the meme, if any. We utilize HVVMemes - a memes dataset on US Politics and Covid-19 memes, released recently as part of the CONSTRAINT@ACL-2022 shared-task. It contains memes, entities referenced, and their associated roles: hero, villain, victim, and other. We further design VECTOR (Visual-semantic role dEteCToR), a robust multi-modal framework for the task, which integrates entity-based contextual information in the multi-modal representation and compare it to several standard unimodal (text-only or image-only) or multi-modal (image+text) models. Our experimental results show that our proposed model achieves an improvement of 4% over the best baseline and 1% over the best competing stand-alone submission from the shared-task. Besides divulging an extensive experimental setup with comparative analyses, we finally highlight the challenges encountered in addressing the complex task of semantic role labeling within memes.

Abstract (translated)

社交媒体上的meme可以影响人们的观点,因为它们将视觉和文本信息以易于消费的方式组合在一起。由于meme立刻变为病毒,因此推断其意图和可能相关的危害性变得至关重要,以便及时采取必要的措施。与meme理解有关的常见问题是检测引用实体并描述每个实体的角色,例如检测任何meme中的“英雄”、“反派”和“受害者”的角色。我们使用HVVMemes - 一个关于美国政治和COVID-19meme的meme数据集,最近作为CONSTRAINT@ACL-2022共享任务的一部分发布。该数据集包含meme、引用实体及其相关角色:英雄、反派、受害者和其他。我们还设计了Vector(视觉语义角色dEteCToR),这是一个强大的多模态框架,以该任务为例,将实体为基础的上下文信息集成到多模态表示中,并将其与几个标准的单模态(仅文本或仅图像)或多模态(仅图像和文本)模型进行比较。我们的实验结果表明,我们提出的模型比最佳基线提高4%,并从共享任务中的最佳单独提交提高1%。除了分享广泛的实验设置并进行比较分析,最后我们重点强调了处理meme中的语义角色标注的复杂性所面临的挑战。

URL

https://arxiv.org/abs/2301.11219

PDF

https://arxiv.org/pdf/2301.11219.pdf