Abstract
Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems, especially ones involving reasoning. Earlier attempts in equipping machines with high-level reasoning have hovered around Visual Question Answering (VQA), one typical task associating vision and language understanding. In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. Unlike previous works in measuring abstract reasoning using RPM, we establish a semantic link between vision and reasoning by providing structure representation. This addition enables a new type of abstract reasoning by jointly operating on the structure representation. Machine reasoning ability using modern computer vision is evaluated in this newly proposed dataset. Additionally, we also provide human performance as a reference. Finally, we show consistent improvement across all models by incorporating a simple neural module that combines visual understanding and structure reasoning.
Abstract (translated)
在涉及低层次感知的基本视觉任务中,如物体识别、检测和跟踪,已经取得了显著进展。不幸的是,在更高层次的视觉问题,特别是涉及推理的视觉问题上,人工视觉系统与人类智能之间仍然存在巨大的性能差距。早期的机器配备高级推理的尝试一直围绕视觉问答(vqa)展开,视觉问答是一项与视觉和语言理解相关的典型任务。在这项工作中,我们提出了一个新的数据集,建立在Raven的渐进矩阵(RPM)的背景下,旨在提升机器智能,通过将视觉与层次表示中的结构、关系和类比推理联系起来。与以前使用RPM度量抽象推理的工作不同,我们通过提供结构表示在视觉和推理之间建立语义链接。这一增加使得一种新的抽象推理,通过共同操作的结构表示。在新提出的数据集中,评价了利用现代计算机视觉进行机器推理的能力。此外,我们还提供了人的表现作为参考。最后,我们通过合并一个简单的结合视觉理解和结构推理的神经模块,展示了所有模型的一致性改进。
URL
https://arxiv.org/abs/1903.02741