Paper Reading AI Learner

DepGraph: Towards Any Structural Pruning

2023-01-30 14:02:33
Gongfan Fang, Xinyin Ma, Mingli Song, Michael Bi Mi, Xinchao Wang

Abstract

Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architectures. In this work, we study a highly-challenging yet barely-explored task, any structural pruning, to tackle general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. The most prominent obstacle towards this ambitious goal lies in the structural coupling, which not only forces different layers to be pruned simultaneously, but also expects all parameters in a removed group to be consistently unimportant, thereby avoiding significant performance degradation after pruning. To address this problem, we propose a general and fully automatic method, Dependency Graph (DepGraph), to explicitly model the inter-dependency between layers and comprehensively group coupled parameters. In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a simple L1 norm criterion, the proposed method consistently yields gratifying performances.

Abstract (translated)

结构减缩通过从神经网络中移除结构分组参数来实现模型加速。然而,不同模型的参数分组模式非常不同,这使得依赖于手动设计分组Scheme的结构特异性减缩方法对新架构的不通用性变得尤为重要。在这项工作中,我们研究了一种挑战性但并不鲜为人知的任务,即任何结构减缩,以解决像卷积神经网络、循环神经网络、通用卷积神经网络和转换器等任意结构的通用结构减缩。这个目标的主要障碍在于结构耦合,它不仅迫使不同层同时减缩,而且还期望所有被移除的参数总是不重要的,从而避免在减缩后发生显著的性能下降。为了解决这个问题,我们提出了一种通用且完全自动的方法,Dependency Graph( DepGraph),以 explicitly 建模层之间的依赖关系并全面分组耦合参数。在这项工作中,我们对多个架构和任务进行了广泛的评估,包括ResNe(X)t、DenseNet、MobileNet和图像识别Transformer,Graph GAT、3D point cloud DGCNN,以及语言LSTM,并证明了,即使使用简单的L1范数 criterion,该提出的方法仍然 consistently 产生令人满意的性能。

URL

https://arxiv.org/abs/2301.12900

PDF

https://arxiv.org/pdf/2301.12900.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot