The role of object-centric representations, guided attention, and external memory on generalizing visual relations

Abstract
Abstract (translated)
URL
PDF

Abstract

Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the generalization of the relations learned. In recent years, several innovations in DNNs have been developed in order to enable learning abstract relation from images. In this work, we systematically evaluate a series of DNNs that integrate mechanism such as slot attention, recurrently guided attention, and external memory, in the simplest possible visual reasoning task: deciding whether two objects are the same or different. We found that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board. We conclude that abstract visual reasoning remains largely an unresolved challenge for DNNs.

Abstract (translated)

视觉推理是视觉研究的长期目标。在过去的十年中，有几种研究尝试将深度学习神经网络(DNN)应用于从图像中学习视觉关系的任务，但所取得的 generalization 效果相对较低。近年来，DNN 中几项创新已经被开发出来，以便从图像中学习抽象关系。在本研究中，我们系统地评估了一系列 DNN，这些 DNN 集成了例如窗体注意力、循环引导注意力和外部记忆等机制，在最简单的视觉推理任务中：决定两个物体是否相同或不同。我们发现，虽然某些模型在将相同-不同关系泛化到特定类型的图像方面表现更好，但没有任何模型能够在所有情况下泛化 this 关系。我们得出结论，抽象的视觉推理仍然是 DNN 面临的未解决挑战。

URL

https://arxiv.org/abs/2304.07091

PDF

https://arxiv.org/pdf/2304.07091.pdf