Paper Reading AI Learner

Pragmatic inference and visual abstraction enable contextual flexibility during visual communication

2019-03-11 17:18:16
Judith Fan, Robert Hawkins, Mike Wu, Noah Goodman

Abstract

Visual modes of communication are ubiquitous in modern life. Here we investigate drawing, the most basic form of visual communication. Communicative drawing poses a core challenge for theories of how vision and social cognition interact, requiring a detailed understanding of how sensory information and social context jointly determine what information is relevant to communicate. Participants (N=192) were paired in an online environment to play a sketching-based reference game. On each trial, both participants were shown the same four objects, but in different locations. The sketcher's goal was to draw one of these objects - the target - so that the viewer could select it from the array. There were two types of trials: close, where objects belonged to the same basic-level category, and far, where objects belonged to different categories. We found that people exploited information in common ground with their partner to efficiently communicate about the target: on far trials, sketchers achieved high recognition accuracy while applying fewer strokes, using less ink, and spending less time on their drawings than on close trials. We hypothesized that humans succeed in this task by recruiting two core competencies: (1) visual abstraction, the capacity to perceive the correspondence between an object and a drawing of it; and (2) pragmatic inference, the ability to infer what information would help a viewer distinguish the target from distractors. To evaluate this hypothesis, we developed a computational model of the sketcher that embodied both competencies, instantiated as a deep convolutional neural network nested within a probabilistic program. We found that this model fit human data well and outperformed lesioned variants, providing an algorithmically explicit theory of how perception and social cognition jointly support contextual flexibility in visual communication.

Abstract (translated)

视觉交流方式在现代生活中普遍存在。在这里我们研究绘画,视觉交流的最基本形式。交际图对视觉和社会认知如何相互作用的理论提出了核心挑战,需要对感官信息和社会语境如何共同决定与交际相关的信息有一个详细的理解。参与者(n=192)在一个在线环境中配对,玩一个基于草图的参考游戏。在每一个试验中,两个参与者都被展示了相同的四个物体,但在不同的位置。草图绘制者的目标是绘制其中一个对象-目标-以便观众可以从数组中选择它。有两种类型的测试:close,对象属于同一个基本级别类别;far,对象属于不同类别。我们发现,人们利用与合作伙伴共同的信息来有效地交流目标:在远距测试中,草图绘制者在使用较少的笔画、使用较少的墨水以及在绘图上花费的时间少于近距离测试时,获得了较高的识别精度。我们假设人类通过招募两种核心能力来成功完成这项任务:(1)视觉抽象,感知物体和物体图形之间的对应关系的能力;以及(2)语用推理,推断什么信息有助于观察者区分目标和分心者的能力。为了评估这个假设,我们开发了一个草图的计算模型,它体现了两种能力,被例示为一个嵌套在概率程序中的深卷积神经网络。我们发现,这个模型很好地符合人类数据,并且优于受损伤的变体,提供了一个算法上明确的理论,说明感知和社会认知如何共同支持视觉交流中的上下文灵活性。

URL

https://arxiv.org/abs/1903.04448

PDF

https://arxiv.org/pdf/1903.04448.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot