Paper Reading AI Learner

Data-free Knowledge Distillation for Fine-grained Visual Categorization

2024-04-18 09:44:56
Renrong Shao, Wei Zhang, Jianhua Yin, Jun Wang

Abstract

Data-free knowledge distillation (DFKD) is a promising approach for addressing issues related to model compression, security privacy, and transmission restrictions. Although the existing methods exploiting DFKD have achieved inspiring achievements in coarse-grained classification, in practical applications involving fine-grained classification tasks that require more detailed distinctions between similar categories, sub-optimal results are obtained. To address this issue, we propose an approach called DFKD-FGVC that extends DFKD to fine-grained visual categorization~(FGVC) tasks. Our approach utilizes an adversarial distillation framework with attention generator, mixed high-order attention distillation, and semantic feature contrast learning. Specifically, we introduce a spatial-wise attention mechanism to the generator to synthesize fine-grained images with more details of discriminative parts. We also utilize the mixed high-order attention mechanism to capture complex interactions among parts and the subtle differences among discriminative features of the fine-grained categories, paying attention to both local features and semantic context relationships. Moreover, we leverage the teacher and student models of the distillation framework to contrast high-level semantic feature maps in the hyperspace, comparing variances of different categories. We evaluate our approach on three widely-used FGVC benchmarks (Aircraft, Cars196, and CUB200) and demonstrate its superior performance.

Abstract (translated)

数据无感知知识蒸馏(DFKD)是一种解决与模型压缩、隐私和安全相关问题的有前途的方法,尤其是在涉及对类似类别的细粒度分类任务的实际应用中。虽然利用DFKD的现有方法已经取得了鼓舞人心的成就,但在实际应用中涉及细粒度分类任务时,得到的结果往往是不最优的。为了解决这个问题,我们提出了一个名为DFKD-FGVC的方法,将其扩展到细粒度视觉分类(FGVC)任务中。我们的方法利用注意力生成器、混合高阶注意力蒸馏和语义特征对比学习。具体来说,我们在生成器中引入了一个空间级的注意力机制,以合成具有更多细节的判别部分的精细图像。我们还利用混合高阶注意力机制来捕捉部分之间的复杂互动以及细粒度类别的判别特征之间的微妙差异,关注局部特征和语义上下文关系。此外,我们还利用蒸馏框架的教师和学生模型来对比超空间中高级语义特征映射的差异,比较不同类别的差异。我们在三个广泛使用的FGVC基准(飞机、汽车196和CUB200)上评估我们的方法,并证明了其优越性能。

URL

https://arxiv.org/abs/2404.12037

PDF

https://arxiv.org/pdf/2404.12037.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot