Paper Reading AI Learner

A simple theory for training response of deep neural networks

2024-05-07 07:20:15
Kenichi Nakazato

Abstract

Deep neural networks give us a powerful method to model the training dataset's relationship between input and output. We can regard that as a complex adaptive system consisting of many artificial neurons that work as an adaptive memory as a whole. The network's behavior is training dynamics with a feedback loop from the evaluation of the loss function. We already know the training response can be constant or shows power law-like aging in some ideal situations. However, we still have gaps between those findings and other complex phenomena, like network fragility. To fill the gap, we introduce a very simple network and analyze it. We show the training response consists of some different factors based on training stages, activation functions, or training methods. In addition, we show feature space reduction as an effect of stochastic training dynamics, which can result in network fragility. Finally, we discuss some complex phenomena of deep networks.

Abstract (translated)

深度神经网络给我们了一种强大的方法来建模训练数据输入和输出之间的关系。我们可以将这看作是一个由许多人工神经元组成的复杂适应系统,作为一个整体,这些神经元表现出一种自适应记忆的特性。网络的行为是训练动态,通过损失函数的评估反馈循环。我们已知培训响应可以是常数,或者在某些理想情况下表现出类似于功率定律的老化。然而,我们仍然存在在那些发现和其它复杂现象之间的一些空白,比如网络的脆弱性。为了填补这个空白,我们引入了一个非常简单的网络,并对其进行分析。我们展示了培训响应取决于训练阶段、激活函数或训练方法的不同因素。此外,我们还展示了随机训练动态对特征空间缩减的影响,这可能导致网络脆弱性。最后,我们讨论了一些关于深度网络的复杂现象。

URL

https://arxiv.org/abs/2405.04074

PDF

https://arxiv.org/pdf/2405.04074.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot