Paper Reading AI Learner

Half-Space Feature Learning in Neural Networks

2024-04-05 12:03:19
Mahesh Lorik Yadav, Harish Guruprasad Ramaswamy, Chandrashekar Lakshminarayanan

Abstract

There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.

Abstract (translated)

目前存在两种极端观点用于神经网络特征学习:(i)神经网络简单地实现核方法(类似于NTK),因此没有特征被学习;(ii)神经网络可以表示(因此可以学习)适合数据的有层次特征。在本文中,我们认为基于一种新颖的观点,这两种解释都不太可能正确。神经网络可以看作是一个专家的混合,每个专家对应于一个(层数长度)通过隐藏单元的序列路径。我们使用这种替代解释来激励一个模型,称为深度线性有门网络(DLGN),该模型处于深度线性网络和ReLU网络之间。与深度线性网络不同,DLGN能够学习非线性特征(然后将这些特征进行线性组合),与ReLU网络不同,这些特征最终是简单的——每个特征实际上是输入空间中(层数)半空间的指示函数。这种观点允许对特征进行全面的全局可视化,而不仅仅是基于局部可视化对神经元的可视化。DLGN中的特征学习已经被证明是存在的,而且是通过在输入空间中学习包含目标函数平滑区域的半空间来实现的。由于DLGN的结构,后层的神经元与前层的神经元本质上相同——它们都代表一个半空间——然而,梯度下降的动态使后层神经元的聚类特征更加明显。我们假设ReLU网络也具有类似特征学习行为。

URL

https://arxiv.org/abs/2404.04312

PDF

https://arxiv.org/pdf/2404.04312.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot