Abstract
Downsampling operators break the shift invariance of convolutional neural networks (CNNs) and this affects the robustness of features learned by CNNs when dealing with even small pixel-level shift. Through a large-scale correlation analysis framework, we study shift invariance of CNNs by inspecting existing downsampling operators in terms of their maximum-sampling bias (MSB), and find that MSB is negatively correlated with shift invariance. Based on this crucial insight, we propose a learnable pooling operator called Translation Invariant Polyphase Sampling (TIPS) and two regularizations on the intermediate feature maps of TIPS to reduce MSB and learn translation-invariant representations. TIPS can be integrated into any CNN and can be trained end-to-end with marginal computational overhead. Our experiments demonstrate that TIPS results in consistent performance gains in terms of accuracy, shift consistency, and shift fidelity on multiple benchmarks for image classification and semantic segmentation compared to previous methods and also leads to improvements in adversarial and distributional robustness. TIPS results in the lowest MSB compared to all previous methods, thus explaining our strong empirical results.
Abstract (translated)
降低采样操作会破坏卷积神经网络(CNNs)的平移不变性,从而影响CNNs在处理小像素级平移时的鲁棒性。通过一个大规模的相关性分析框架,我们研究了CNNs的平移不变性,通过观察现有降低采样操作的最大采样偏差(MSB),发现MSB与平移不变性呈负相关。根据这一关键性的见解,我们提出了一个可学习聚类操作,称为平移不变多相采样(TIPS),以及两个平移不变的中间特征图的 regularization,以减少 MSB 并学习平移不变的表示。TIPS可以集成到任何CNN中,并且可以与 marginally computational overhead 一起进行端到端的训练。我们的实验结果表明,与之前的方法相比,TIPS在图像分类和语义分割基准上实现了准确性和平移一致性的显着提高,同时也提高了对抗性和分布鲁棒性。TIPS的MSB比所有之前方法都要低,因此我们的实证结果得到了很好的解释。
URL
https://arxiv.org/abs/2404.07410