Paper Reading AI Learner

Knowledge Diffusion for Distillation

2023-05-25 04:49:34
Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu

Abstract

The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature, and propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models. Our approach is based on the observation that student features typically contain more noises than teacher features due to the smaller capacity of student model. To address this, we propose to denoise student features using a diffusion model trained by teacher features. This allows us to perform better distillation between the refined clean feature and teacher feature. Additionally, we introduce a light-weight diffusion model with a linear autoencoder to reduce the computation cost and an adpative noise matching module to improve the denoising performance. Extensive experiments demonstrate that DiffKD is effective across various types of features and achieves state-of-the-art performance consistently on image classification, object detection, and semantic segmentation tasks. Code will be available at this https URL.

Abstract (translated)

教师和学生之间的表示差距是知识蒸馏(KD)领域的一个新兴话题。为了缩小差距并提高性能,当前的方法常常采用复杂的训练计划、损失函数和特征对齐,这些任务和特征特定的。在本文中,我们指出这些方法的核心是排除噪声信息并蒸馏特征中的有价值信息,并提出了一种新的KD方法称为DiffKD,使用扩散模型来明确消除特征。我们的的方法是基于观察,学生特征通常包含比教师特征更多的噪声,因为学生模型的容量较小。为了解决这一问题,我们提议使用教师特征训练的扩散模型来消除学生特征。这允许我们在 refined clean feature 和教师特征之间的蒸馏任务中更好地进行知识蒸馏。此外,我们介绍了一种轻量级扩散模型,并配置了一个线性自编码器,以降低计算成本,并引入了一种自适应噪声匹配模块,以提高去噪性能。广泛的实验表明,DiffKD 适用于各种特征类型,并在图像分类、对象检测和语义分割任务中实现了最先进的性能。代码将在本链接中提供。

URL

https://arxiv.org/abs/2305.15712

PDF

https://arxiv.org/pdf/2305.15712.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot