Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

Abstract
Abstract (translated)
URL
PDF

Abstract

Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either they disregard the optimizer and only provide uniform convergence upper bounds with no separating lower bounds, or they consider simplistic tasks that do not truly mirror the locality and translation invariance as found in real-world vision tasks. To address these deficiencies, we introduce the Dynamic Signal Distribution (DSD) classification task that models an image as consisting of $k$ patches, each of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. On this task, for any orthogonally equivariant algorithm like gradient descent, we prove that CNNs require $\tilde{O}(k+d)$ samples, whereas LCNs require $\Omega(kd)$ samples, establishing the statistical advantages of weight sharing in translation invariant tasks. Furthermore, LCNs need $\tilde{O}(k(k+d))$ samples, compared to $\Omega(k^2d)$ samples for FCNs, showcasing the benefits of locality in local tasks. Additionally, we develop information theoretic tools for analyzing randomized algorithms, which may be of interest for statistical research.

Abstract (translated)

翻译：视觉任务的特点在于局部性和平移不变性的属性。在这些任务上，卷积神经网络（CNNs）优越的表现很大程度上归因于它们架构中归纳偏好的平移不变性和权重共享。现有尝试衡量这些偏见在CNNs上的统计益处与局部连接卷积神经网络（LCNs）和全连接神经网络（FCNs）相比是否成立，可以分为以下几类：要么他们忽略了优化器，仅提供无分离下界的不均匀收敛，要么他们考虑了不真正反映现实世界视觉任务中局部性和平移不变性的简单任务。为了克服这些缺陷，我们引入了动态信号分布（DSD）分类任务，该任务将图像表示为由$k$个补丁组成，每个补丁的维度为$d$，标签由一个维度为$d$的稀疏信号向量决定。在这个任务上，对于任何正交等价的算法，如梯度下降，我们证明CNNs需要$\tilde{O}(k+d)$样本，而LCNs需要$\Omega(kd)$样本，从而建立了平移不变任务中权重共享的统计优势。此外，LCNs需要$\tilde{O}(k(k+d))$样本，与FCNs的$\Omega(k^2d)$样本相比，展示了局部性在局部任务上的优势。此外，我们还开发了信息论工具来分析随机算法，这可能对统计研究有感兴趣。

URL

https://arxiv.org/abs/2403.15707

PDF

https://arxiv.org/pdf/2403.15707.pdf

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

Abstract

Abstract (translated)

URL

PDF Copy

PDF