Paper Reading AI Learner

Deep Exploration of Epoch-wise Double Descent in Noisy Data: Signal Separation, Large Activation, and Benign Overfitting

2026-01-13 08:13:15
Tomoki Kubo, Ryuken Uda, Yusuke Iida

Abstract

Deep double descent is one of the key phenomena underlying the generalization capability of deep learning models. In this study, epoch-wise double descent, which is delayed generalization following overfitting, was empirically investigated by focusing on the evolution of internal structures. Fully connected neural networks of three different sizes were trained on the CIFAR-10 dataset with 30% label noise. By decomposing the loss curves into signal contributions from clean and noisy training data, the epoch-wise evolutions of internal signals were analyzed separately. Three main findings were obtained from this analysis. First, the model achieved strong re-generalization on test data even after perfectly fitting noisy training data during the double descent phase, corresponding to a "benign overfitting" state. Second, noisy data were learned after clean data, and as learning progressed, their corresponding internal activations became increasingly separated in outer layers; this enabled the model to overfit only noisy data. Third, a single, very large activation emerged in the shallow layer across all models; this phenomenon is referred as "outliers," "massive activa-tions," and "super activations" in recent large language models and evolves with re-generalization. The magnitude of large activation correlated with input patterns but not with output patterns. These empirical findings directly link the recent key phenomena of "deep double descent," "benign overfitting," and "large activation", and support the proposal of a novel scenario for understanding deep double descent.

Abstract (translated)

深度双重下降是深入理解深度学习模型泛化能力的关键现象之一。在这项研究中,通过对内部结构演化的关注,从经验上探讨了基于每个训练周期的双重下降(即过度拟合后的延迟泛化)。实验使用带有30%标签噪声的CIFAR-10数据集对三种不同大小的全连接神经网络进行了训练。通过将损失曲线分解为干净和有噪声的训练数据信号贡献,分别分析了各个时期内部信号的变化。 从这项分析中获得了三个主要发现: 第一,即使模型在双重下降阶段完美地拟合了带噪训练数据之后,它仍然能够实现对测试数据的强大再泛化能力,这对应于所谓的“良性过拟合”状态。 第二,在学习过程中,先学习干净的数据后学习有噪声的数据,并且随着学习的进展,它们对应的内部激活在外部层中变得越来越分离。这意味着模型仅过度拟合并适应了那些带有噪声的数据点。 第三,在所有模型的浅层中出现了一个非常大的单一激活现象;这一现象在最近的大规模语言模型中被称为“离群值”、“巨大激活”和“超级激活”。这种大规模激活的现象与输入模式有关,但与输出模式无关。 这些经验发现直接将近期的关键现象——深度双重下降、良性过拟合以及大型激活联系在一起,并支持提出了一种新的理解深度双重下降的情景。

URL

https://arxiv.org/abs/2601.08316

PDF

https://arxiv.org/pdf/2601.08316.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot