Paper Reading AI Learner

A Simple and Generic Framework for Feature Distillation via Channel-wise Transformation

2023-03-23 12:13:29
Ziwei Liu, Yongtao Wang, Xiaojie Chu

Abstract

Knowledge distillation is a popular technique for transferring the knowledge from a large teacher model to a smaller student model by mimicking. However, distillation by directly aligning the feature maps between teacher and student may enforce overly strict constraints on the student thus degrade the performance of the student model. To alleviate the above feature misalignment issue, existing works mainly focus on spatially aligning the feature maps of the teacher and the student, with pixel-wise transformation. In this paper, we newly find that aligning the feature maps between teacher and student along the channel-wise dimension is also effective for addressing the feature misalignment issue. Specifically, we propose a learnable nonlinear channel-wise transformation to align the features of the student and the teacher model. Based on it, we further propose a simple and generic framework for feature distillation, with only one hyper-parameter to balance the distillation loss and the task specific loss. Extensive experimental results show that our method achieves significant performance improvements in various computer vision tasks including image classification (+3.28% top-1 accuracy for MobileNetV1 on ImageNet-1K), object detection (+3.9% bbox mAP for ResNet50-based Faster-RCNN on MS COCO), instance segmentation (+2.8% Mask mAP for ResNet50-based Mask-RCNN), and semantic segmentation (+4.66% mIoU for ResNet18-based PSPNet in semantic segmentation on Cityscapes), which demonstrates the effectiveness and the versatility of the proposed method. The code will be made publicly available.

Abstract (translated)

知识蒸馏是一种流行的技术,通过模拟从大型教师模型向小型学生模型传输知识来实现。然而,直接对齐教师和学生的特征映射可能强制学生接受过于严格的约束,从而损害学生模型的性能。为了减轻上述特征不匹配的问题,现有的工作主要关注在空间上对齐教师和学生的特征映射,使用像素转换。在本文中,我们新发现,沿着通道维度对齐教师和学生的特征映射也可以有效地解决特征不匹配问题。具体来说,我们提出了一种可学习非线性通道转换来对齐学生和教师模型的特征。基于它,我们进一步提出了一种简单而通用的特征蒸馏框架,只有一个超参数来平衡蒸馏损失和任务特定损失。广泛的实验结果表明,我们的方法在多种计算机视觉任务中实现了显著的性能改进,包括图像分类(+3.28%的 ImageNet-1K 上 Top-1 准确性)、物体检测(+3.9%的 MS COCO 上的 bbox mAP)、实例分割(+2.8%的 ResNet50 上的Mask-RCNN Mask mAP)、语义分割(+4.66%的 ResNet18 上的PSPNet 在 Cityscapes 上的语义分割 mIoU),这表明了我们方法的有效性和灵活性。代码将公开可用。

URL

https://arxiv.org/abs/2303.13212

PDF

https://arxiv.org/pdf/2303.13212.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot