Abstract
The ability of intelligent agents to play games in human-like fashion is popularly considered a benchmark of progress in Artificial Intelligence. Similarly, performance on multi-disciplinary tasks such as Visual Question Answering (VQA) is considered a marker for gauging progress in Computer Vision. In our work, we bring games and VQA together. Specifically, we introduce the first computational model aimed at Pictionary, the popular word-guessing social game. We first introduce Sketch-QA, an elementary version of Visual Question Answering task. Styled after Pictionary, Sketch-QA uses incrementally accumulated sketch stroke sequences as visual data. Notably, Sketch-QA involves asking a fixed question ("What object is being drawn?") and gathering open-ended guess-words from human guessers. We analyze the resulting dataset and present many interesting findings therein. To mimic Pictionary-style guessing, we subsequently propose a deep neural model which generates guess-words in response to temporally evolving human-drawn sketches. Our model even makes human-like mistakes while guessing, thus amplifying the human mimicry factor. We evaluate our model on the large-scale guess-word dataset generated via Sketch-QA task and compare with various baselines. We also conduct a Visual Turing Test to obtain human impressions of the guess-words generated by humans and our model. Experimental results demonstrate the promise of our approach for Pictionary and similarly themed games.
Abstract (translated)
智能代理以类似人类的方式玩游戏的能力被普遍认为是人工智能发展的基准。同样,视觉问题回答(VQA)等多学科任务的表现被视为衡量计算机视觉进展的标志。在我们的工作中,我们将游戏和VQA结合在一起。具体而言,我们引入了第一个计算机模型,该模型针对流行的猜词社交游戏Pictionary。我们首先介绍Sketch-QA,这是Visual Question Answering任务的初级版本。在Pictionary之后进行设计,Sketch-QA使用递增累加的草图笔划序列作为可视数据。值得注意的是,Sketch-QA涉及询问一个固定的问题(“正在绘制什么对象?”)并且收集来自人类猜测者的开放式猜测词。我们分析结果数据集并在其中提出许多有趣的发现。为了模仿Pictionary式的猜测,我们随后提出了一个深度神经模型,它根据时间演变的人体素描生成猜测词。我们的模型甚至会在猜测时造成人为错误,从而放大人类的模仿因素。我们通过Sketch-QA任务生成的大规模猜测词数据集评估我们的模型,并与各种基线进行比较。我们还进行视觉图灵测试,以获得人类和我们模型产生的猜测词的人类印象。实验结果证明了我们对Pictionary和类似主题游戏的方法的承诺。
URL
https://arxiv.org/abs/1801.09356