Paper Reading AI Learner

The Overfocusing Bias of Convolutional Neural Networks: A Saliency-Guided Regularization Approach

2024-09-25 21:30:16
David Bertoin, Eduardo Hugo Sanchez, Mehdi Zouitine, Emmanuel Rachelson

Abstract

Despite transformers being considered as the new standard in computer vision, convolutional neural networks (CNNs) still outperform them in low-data regimes. Nonetheless, CNNs often make decisions based on narrow, specific regions of input images, especially when training data is limited. This behavior can severely compromise the model's generalization capabilities, making it disproportionately dependent on certain features that might not represent the broader context of images. While the conditions leading to this phenomenon remain elusive, the primary intent of this article is to shed light on this observed behavior of neural networks. Our research endeavors to prioritize comprehensive insight and to outline an initial response to this phenomenon. In line with this, we introduce Saliency Guided Dropout (SGDrop), a pioneering regularization approach tailored to address this specific issue. SGDrop utilizes attribution methods on the feature map to identify and then reduce the influence of the most salient features during training. This process encourages the network to diversify its attention and not focus solely on specific standout areas. Our experiments across several visual classification benchmarks validate SGDrop's role in enhancing generalization. Significantly, models incorporating SGDrop display more expansive attributions and neural activity, offering a more comprehensive view of input images in contrast to their traditionally trained counterparts.

Abstract (translated)

尽管 transformers 被认为是计算机视觉的新标准,但卷积神经网络(CNN)在低数据量的情况下仍然比它们表现更好。然而,CNN 通常基于输入图像的狭小、特定的区域做出决策,尤其是在训练数据有限的情况下。这种行为会严重削弱模型的泛化能力,使其对某些可能不表示图像更广泛上下文的特征过分依赖。虽然导致这种现象的条件仍然很难确定,但本文的主要意图是阐明观察到的神经网络行为。我们的研究旨在优先获得全面的理解,并探讨这个问题的一些初步响应。因此,我们引入了 Saliency Guided Dropout(SGDrop),一种专门针对这个问题的开创性 regularization 方法。SGDrop 利用特征图上的归一化方法来确定并减小在训练过程中最引人注目的特征的影响。这一过程鼓励网络多样化其关注,不要只关注特定突出区域。我们在多个视觉分类基准上的实验证实了 SGDrop 在增强泛化能力方面的作用。值得注意的是,包括 SGDrop 在内的模型具有更广泛的归因和神经活动,提供了输入图像的更全面视图,与传统训练相比,这种视图更具价值。

URL

https://arxiv.org/abs/2409.17370

PDF

https://arxiv.org/pdf/2409.17370.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot