Paper Reading AI Learner

Rethinking Soft Label in Label Distribution Learning Perspective

2023-01-31 06:47:19
Seungbum Hong, Jihun Yoon, Bogyu Park, Min-Kook Choi

Abstract

The primary goal of training in early convolutional neural networks (CNN) is the higher generalization performance of the model. However, as the expected calibration error (ECE), which quantifies the explanatory power of model inference, was recently introduced, research on training models that can be explained is in progress. We hypothesized that a gap in supervision criteria during training and inference leads to overconfidence, and investigated that performing label distribution learning (LDL) would enhance the model calibration in CNN training. To verify this assumption, we used a simple LDL setting with recent data augmentation techniques. Based on a series of experiments, the following results are obtained: 1) State-of-the-art KD methods significantly impede model calibration. 2) Training using LDL with recent data augmentation can have excellent effects on model calibration and even in generalization performance. 3) Online LDL brings additional improvements in model calibration and accuracy with long training, especially in large-size models. Using the proposed approach, we simultaneously achieved a lower ECE and higher generalization performance for the image classification datasets CIFAR10, 100, STL10, and ImageNet. We performed several visualizations and analyses and witnessed several interesting behaviors in CNN training with the LDL.

Abstract (translated)

早期卷积神经网络(CNN)的训练的主要目标是提高模型的泛化性能。然而,最近引入了 expected calibration error (ECE),该指标衡量模型推理解释力,因此正在研究能够解释训练模型的方法。我们假设训练和推理过程中的监督标准之间存在差异会导致过度自信,并研究使用标签分布学习(LDL)可以提高CNN训练模型的校准。为了验证这个假设,我们使用了最近的数据增强技术简单的LDL设置。通过一系列实验,以下结果得出:1) 先进的 KD 方法严重阻碍模型校准。2) 使用最近的数据增强技术使用 LDL 训练可以显著提高模型校准和泛化性能。3) 在线 LDL 训练在长时间训练期间可以增加模型校准和精度,特别是大型模型。使用我们提出的方法,我们同时实现了 CIFAR10、100、STL10 和 ImageNet 图像分类数据集更低的 ECE 和提高更高的泛化性能。我们进行了一些可视化和分析,并见证了在 CNN 训练中使用 LDL 的几个有趣的行为。

URL

https://arxiv.org/abs/2301.13444

PDF

https://arxiv.org/pdf/2301.13444.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot