Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

Abstract
Abstract (translated)
URL
PDF

Abstract

Visual modes of communication are ubiquitous in modern life. Here we investigate drawing, the most basic form of visual communication. Communicative drawing poses a core challenge for theories of how vision and social cognition interact, requiring a detailed understanding of how sensory information and social context jointly determine what information is relevant to communicate. Participants (N=192) were paired in an online environment to play a sketching-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to draw one of these objects - the target - so that the viewer could select it from the array. There were two types of trials: close, where objects belonged to the same basic-level category, and far, where objects belonged to different categories. We found that people exploited information in common ground with their partner to efficiently communicate about the target: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core competencies: (1) visual abstraction, the capacity to perceive the correspondence between an object and a drawing of it; and (2) pragmatic inference, the ability to infer what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both competencies, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants, providing an algorithmically explicit theory of how perception and social cognition jointly support contextual flexibility in visual communication.

Abstract (translated)

视觉交流方式在现代生活中普遍存在。在这里我们研究绘画，视觉交流的最基本形式。交际图对视觉和社会认知如何相互作用的理论提出了核心挑战，需要对感官信息和社会语境如何共同决定与交际相关的信息有一个详细的理解。参与者（n=192）在一个在线环境中配对，玩一个基于草图的参考游戏。在每一个试验中，两个参与者都被展示了相同的四个物体，但在不同的位置。草图绘制者的目标是绘制其中一个对象-目标-以便观众可以从数组中选择它。有两种类型的测试：close，对象属于同一个基本级别类别；far，对象属于不同类别。我们发现，人们利用与合作伙伴共同的信息来有效地交流目标：在远距测试中，草图绘制者在使用较少的笔画、使用较少的墨水以及在绘图上花费的时间少于近距离测试时，获得了较高的识别精度。我们假设人类通过招募两种核心能力来成功完成这项任务：（1）视觉抽象，感知物体和物体图形之间的对应关系的能力；以及（2）语用推理，推断什么信息有助于观察者区分目标和分心者的能力。为了评估这个假设，我们开发了一个草图的计算模型，它体现了两种能力，被例示为一个嵌套在概率程序中的深卷积神经网络。我们发现，这个模型很好地符合人类数据，并且优于受损伤的变体，提供了一个算法上明确的理论，说明感知和社会认知如何共同支持视觉交流中的上下文灵活性。

URL

https://arxiv.org/abs/1903.04448

PDF

https://arxiv.org/pdf/1903.04448.pdf