Paper Reading AI Learner

Game of Sketches: Deep Recurrent Models of Pictionary-style Word Guessing

2018-01-29 03:54:03
Ravi Kiran Sarvadevabhatla, Shiv Surya, Trisha Mittal, Venkatesh Babu Radhakrishnan
           

Abstract

The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision. In our work, we bring games and VQA together. Specifically, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we subsequently propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.

Abstract (translated)

智能代理以类似人类的方式玩游戏的能力被普遍认为是人工智能发展的基准。同样,视觉问题回答(VQA)等多学科任务的表现被视为衡量计算机视觉进展的标志。在我们的工作中,我们将游戏和VQA结合在一起。具体而言,我们引入了第一个计算机模型,该模型针对流行的猜词社交游戏Pictionary。我们首先介绍Sketch-QA,这是Visual Question Answering任务的初级版本。在Pictionary之后进行设计,Sketch-QA使用递增累加的草图笔划序列作为可视数据。值得注意的是,Sketch-QA涉及询问一个固定的问题(“正在绘制什么对象?”)并且收集来自人类猜测者的开放式猜测词。我们分析结果数据集并在其中提出许多有趣的发现。为了模仿Pictionary式的猜测,我们随后提出了一个深度神经模型,它根据时间演变的人体素描生成猜测词。我们的模型甚至会在猜测时造成人为错误,从而放大人类的模仿因素。我们通过Sketch-QA任务生成的大规模猜测词数据集评估我们的模型,并与各种基线进行比较。我们还进行视觉图灵测试,以获得人类和我们模型产生的猜测词的人类印象。实验结果证明了我们对Pictionary和类似主题游戏的方法的承诺。

URL

https://arxiv.org/abs/1801.09356

PDF

https://arxiv.org/pdf/1801.09356.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot