Paper Reading AI Learner

GhostNetV3: Exploring the Training Strategies for Compact Models

2024-04-17 09:33:31
Zhenhua Liu, Zhiwei Hao, Kai Han, Yehui Tang, Yunhe Wang

Abstract

Compact neural networks are specially designed for applications on edge devices with faster inference speed yet modest performance. However, training strategies of compact models are borrowed from that of conventional models at present, which ignores their difference in model capacity and thus may impede the performance of compact models. In this paper, by systematically investigating the impact of different training ingredients, we introduce a strong training strategy for compact models. We find that the appropriate designs of re-parameterization and knowledge distillation are crucial for training high-performance compact models, while some commonly used data augmentations for training conventional models, such as Mixup and CutMix, lead to worse performance. Our experiments on ImageNet-1K dataset demonstrate that our specialized training strategy for compact models is applicable to various architectures, including GhostNetV2, MobileNetV2 and ShuffleNetV2. Specifically, equipped with our strategy, GhostNetV3 1.3$\times$ achieves a top-1 accuracy of 79.1% with only 269M FLOPs and a latency of 14.46ms on mobile devices, surpassing its ordinarily trained counterpart by a large margin. Moreover, our observation can also be extended to object detection scenarios. PyTorch code and checkpoints can be found at this https URL.

Abstract (translated)

紧凑型神经网络是专门为具有更快的推理速度但性能较低的边缘设备应用而设计的。然而,紧凑模型的训练策略借自于传统的模型,这忽略了它们在模型容量上的差异,从而可能阻碍紧凑模型的性能。在本文中,我们通过系统地研究不同训练成分对紧凑模型性能的影响,引入了一种强大的紧凑模型训练策略。我们发现,适当的重新参数化和知识蒸馏设计对训练高性能紧凑模型至关重要,而一些常用的用于训练传统模型的数据增强技术(如Mixup和CutMix)会导致性能更差。我们在ImageNet-1K数据集上的实验证明,我们为紧凑模型设计的专用训练策略可以应用于各种架构,包括GhostNetV2、MobileNetV2和ShuffleNetV2。特别地,配备我们的策略,GhostNetV3 1.3$\times$在仅使用269M FLOPs的移动设备上实现了 top-1 准确率为79.1%,延迟为14.46ms,比其正常训练的对应模型取得了很大的优势。此外,我们的观察也可以扩展到目标检测场景。PyTorch代码和检查点可以在这个链接上找到。

URL

https://arxiv.org/abs/2404.11202

PDF

https://arxiv.org/pdf/2404.11202.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot