Abstract
Our theoretical understanding of the inner workings of general convolutional neural networks (CNN) is limited. We here present a new stepping stone towards such understanding in the form of a theory of learning in linear CNNs. By analyzing the gradient descent equations, we discover that using convolutions leads to a mismatch between the dataset structure and the network structure. We show that linear CNNs discover the statistical structure of the dataset with non-linear, stage-like transitions, and that the speed of discovery changes depending on this structural mismatch. Moreover, we find that the mismatch lies at the heart of what we call the 'dominant frequency bias', where linear CNNs arrive at these discoveries using only the dominant frequencies of the different structural parts present in the dataset. Our findings can help explain several characteristics of general CNNs, such as their shortcut learning and their tendency to rely on texture instead of shape.
Abstract (translated)
我们的理论理解对于通用卷积神经网络(CNN)的内部运作机制是有限的。我们在这里提出了一种新的脚步石,即线性CNN学习的理论研究。通过分析梯度下降方程,我们发现使用卷积会导致数据集结构和网络结构之间的不匹配。我们表明,线性CNN可以通过非线性、阶段式的transition发现数据集的统计学结构,并且发现发现速度取决于这种结构不匹配。此外,我们发现不匹配位于我们称之为“主导频率偏见”的核心,线性CNN使用数据集的不同结构部分的主要频率来发现这些发现。我们的发现可以帮助解释通用CNN的一些特性,例如它们的捷径学习以及它们倾向于依赖纹理而不是形状。
URL
https://arxiv.org/abs/2303.02034