Abstract
Despite transformers being considered as the new standard in computer vision, convolutional neural networks (CNNs) still outperform them in low-data regimes. Nonetheless, CNNs often make decisions based on narrow, specific regions of input images, especially when training data is limited. This behavior can severely compromise the model's generalization capabilities, making it disproportionately dependent on certain features that might not represent the broader context of images. While the conditions leading to this phenomenon remain elusive, the primary intent of this article is to shed light on this observed behavior of neural networks. Our research endeavors to prioritize comprehensive insight and to outline an initial response to this phenomenon. In line with this, we introduce Saliency Guided Dropout (SGDrop), a pioneering regularization approach tailored to address this specific issue. SGDrop utilizes attribution methods on the feature map to identify and then reduce the influence of the most salient features during training. This process encourages the network to diversify its attention and not focus solely on specific standout areas. Our experiments across several visual classification benchmarks validate SGDrop's role in enhancing generalization. Significantly, models incorporating SGDrop display more expansive attributions and neural activity, offering a more comprehensive view of input images in contrast to their traditionally trained counterparts.
Abstract (translated)
尽管 transformers 被认为是计算机视觉的新标准,但卷积神经网络(CNN)在低数据量的情况下仍然比它们表现更好。然而,CNN 通常基于输入图像的狭小、特定的区域做出决策,尤其是在训练数据有限的情况下。这种行为会严重削弱模型的泛化能力,使其对某些可能不表示图像更广泛上下文的特征过分依赖。虽然导致这种现象的条件仍然很难确定,但本文的主要意图是阐明观察到的神经网络行为。我们的研究旨在优先获得全面的理解,并探讨这个问题的一些初步响应。因此,我们引入了 Saliency Guided Dropout(SGDrop),一种专门针对这个问题的开创性 regularization 方法。SGDrop 利用特征图上的归一化方法来确定并减小在训练过程中最引人注目的特征的影响。这一过程鼓励网络多样化其关注,不要只关注特定突出区域。我们在多个视觉分类基准上的实验证实了 SGDrop 在增强泛化能力方面的作用。值得注意的是,包括 SGDrop 在内的模型具有更广泛的归因和神经活动,提供了输入图像的更全面视图,与传统训练相比,这种视图更具价值。
URL
https://arxiv.org/abs/2409.17370