Abstract
Although deep neural networks can achieve human-level performance on many object recognition benchmarks, prior work suggests that these same models fail to learn simple abstract relations, such as determining whether two objects are the same or different. Much of this prior work focuses on training convolutional neural networks to classify images of two same or two different abstract shapes, testing generalization on within-distribution stimuli. In this article, we comprehensively study whether deep neural networks can acquire and generalize same-different relations both within and out-of-distribution using a variety of architectures, forms of pretraining, and fine-tuning datasets. We find that certain pretrained transformers can learn a same-different relation that generalizes with near perfect accuracy to out-of-distribution stimuli. Furthermore, we find that fine-tuning on abstract shapes that lack texture or color provides the strongest out-of-distribution generalization. Our results suggest that, with the right approach, deep neural networks can learn generalizable same-different visual relations.
Abstract (translated)
尽管深度神经网络在许多物体识别基准测试中可以达到人类水平的表现,但先前的研究表明,这些相同的模型在学习简单的抽象关系方面(如确定两个物体是否相同或不同)表现不佳。大部分先前的研究都集中在训练卷积神经网络对两个相同或两个不同抽象形状的图像进行分类,并在内部分布刺激物上进行泛化测试。在本文中,我们全面研究了各种架构、预训练形式和微调数据集是否能够使深度神经网络获取和泛化相同或不同关系。我们发现,某些预训练的变压器可以学会一个精确的相同或不同关系,并且该关系在对外部分布刺激物上的泛化精度非常接近完美。此外,我们发现,对缺乏纹理或颜色的抽象形状进行微调可以提供最强的外部分布泛化。我们的结果表明,只要有正确的方法,深度神经网络可以学习泛化的相同或不同视觉关系。
URL
https://arxiv.org/abs/2310.09612