Paper Reading AI Learner

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

2023-03-16 11:09:22
Yi Xie, Huaidong Zhang, Xuemiao Xu, Jianqing Zhu, Shengfeng He

Abstract

Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective knowledge imitation during the most critical early training period, causing final performance degeneration. To tackle this issue, we propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity. Specifically, the employed student model is initially a heavy model to fruitfully learn distilled knowledge in the early training epochs, and the student model is gradually compressed during the training. To dynamically adjust the model capacity, our dynamic framework inserts a learnable convolutional layer within each residual block in the student model as the channel importance indicator. The indicator is optimized simultaneously by the image retrieval loss and the compression loss, and a retrieval-guided gradient resetting mechanism is proposed to release the gradient conflict. Extensive experiments show that our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher, our method saves 67.13% model parameters and 65.67% FLOPs (around 24.13% and 21.94% higher than state-of-the-arts) without sacrificing accuracy (around 2.11% mAP higher than state-of-the-arts).

Abstract (translated)

以前的基于知识蒸馏的高效图像检索方法使用轻量级网络作为快速推理的学生模型。然而,轻量级的学生模型在训练初期最重要的阶段缺乏有效的知识模仿能力,导致最终性能退化。为了解决这一问题,我们提出了一种能力动态蒸馏框架,该框架构建了一个可编辑的学生模型。具体来说,使用的学生模型一开始是一个重型模型,在训练 epoch 的早期阶段积极地学习蒸馏知识,学生模型在训练过程中逐渐压缩。为了动态调整模型能力,我们的动态框架在每个残留块中嵌入一个可学习卷积层,作为通道重要性指示器。指示器同时由图像检索损失和压缩损失优化,并提出了检索引导梯度重置机制,以释放梯度冲突。广泛的实验表明,我们的方法具有更快的推理速度和更高的准确性,例如,在VeRi-776数据集上,将ResNet101作为老师,我们的方法节省67.13%的模型参数和65.67%的FLOPs(约24.13%和21.94%高于平均水平)而无需牺牲准确性(约2.11%的mAP高于平均水平)。

URL

https://arxiv.org/abs/2303.09230

PDF

https://arxiv.org/pdf/2303.09230.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot