Paper Reading AI Learner

Role of Locality and Weight Sharing in Image-Based Tasks: A Sample Complexity Separation between CNNs, LCNs, and FCNs

2024-03-23 03:57:28
Aakash Lahoti, Stefani Karp, Ezra Winston, Aarti Singh, Yuanzhi Li

Abstract

Vision tasks are characterized by the properties of locality and translation invariance. The superior performance of convolutional neural networks (CNNs) on these tasks is widely attributed to the inductive bias of locality and weight sharing baked into their architecture. Existing attempts to quantify the statistical benefits of these biases in CNNs over locally connected convolutional neural networks (LCNs) and fully connected neural networks (FCNs) fall into one of the following categories: either they disregard the optimizer and only provide uniform convergence upper bounds with no separating lower bounds, or they consider simplistic tasks that do not truly mirror the locality and translation invariance as found in real-world vision tasks. To address these deficiencies, we introduce the Dynamic Signal Distribution (DSD) classification task that models an image as consisting of $k$ patches, each of dimension $d$, and the label is determined by a $d$-sparse signal vector that can freely appear in any one of the $k$ patches. On this task, for any orthogonally equivariant algorithm like gradient descent, we prove that CNNs require $\tilde{O}(k+d)$ samples, whereas LCNs require $\Omega(kd)$ samples, establishing the statistical advantages of weight sharing in translation invariant tasks. Furthermore, LCNs need $\tilde{O}(k(k+d))$ samples, compared to $\Omega(k^2d)$ samples for FCNs, showcasing the benefits of locality in local tasks. Additionally, we develop information theoretic tools for analyzing randomized algorithms, which may be of interest for statistical research.

Abstract (translated)

翻译:视觉任务的特点在于局部性和平移不变性的属性。在这些任务上,卷积神经网络(CNNs)优越的表现很大程度上归因于它们架构中归纳偏好的平移不变性和权重共享。现有尝试衡量这些偏见在CNNs上的统计益处与局部连接卷积神经网络(LCNs)和全连接神经网络(FCNs)相比是否成立,可以分为以下几类:要么他们忽略了优化器,仅提供无分离下界的不均匀收敛,要么他们考虑了不真正反映现实世界视觉任务中局部性和平移不变性的简单任务。为了克服这些缺陷,我们引入了动态信号分布(DSD)分类任务,该任务将图像表示为由$k$个补丁组成,每个补丁的维度为$d$,标签由一个维度为$d$的稀疏信号向量决定。在这个任务上,对于任何正交等价的算法,如梯度下降,我们证明CNNs需要$\tilde{O}(k+d)$样本,而LCNs需要$\Omega(kd)$样本,从而建立了平移不变任务中权重共享的统计优势。此外,LCNs需要$\tilde{O}(k(k+d))$样本,与FCNs的$\Omega(k^2d)$样本相比,展示了局部性在局部任务上的优势。此外,我们还开发了信息论工具来分析随机算法,这可能对统计研究有感兴趣。

URL

https://arxiv.org/abs/2403.15707

PDF

https://arxiv.org/pdf/2403.15707.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot