Paper Reading AI Learner

BasisConv: A method for compressed representation and learning in CNNs

2019-06-11 12:07:48
Muhammad Tayyab, Abhijit Mahalanobis

Abstract

It is well known that Convolutional Neural Networks (CNNs) have significant redundancy in their filter weights. Various methods have been proposed in the literature to compress trained CNNs. These include techniques like pruning weights, filter quantization and representing filters in terms of a basis functions. Our approach falls in this latter class of strategies, but is distinct in that that we show both compressed learning and representation can be achieved without significant modifications of popular CNN architectures. Specifically, any convolution layer of the CNN is easily replaced by two successive convolution layers: the first is a set of fixed filters (that represent the knowledge space of the entire layer and do not change), which is followed by a layer of one-dimensional filters (that represent the learned knowledge in this space). For the pre-trained networks, the fixed layer is just the truncated eigen-decompositions of the original filters. The 1D filters are initialized as the weights of linear combination, but are fine-tuned to recover any performance loss due to the truncation. For training networks from scratch, we use a set of random orthogonal fixed filters (that never change), and learn the 1D weight vector directly from the labeled data. Our method substantially reduces i) the number of learnable parameters during training, and ii) the number of multiplication operations and filter storage requirements during implementation. It does so without requiring any special operators in the convolution layer, and extends to all known popular CNN architectures. We apply our method to four well known network architectures trained with three different data sets. Results show a consistent reduction in i) the number of operations by up to a factor of 5, and ii) number of learnable parameters by up to a factor of 18, with less than 3% drop in performance on the CIFAR100 dataset.

Abstract (translated)

众所周知,卷积神经网络(CNN)在其滤波权值上具有显著的冗余性。文献中提出了各种压缩训练后的CNN的方法。这些技术包括修剪权重、滤波器量化和用基函数表示滤波器。我们的方法属于后一类策略,但不同的是,我们表明压缩学习和表示都可以实现,而无需对流行的CNN架构进行重大修改。具体地说,CNN的任何卷积层都很容易被两个连续的卷积层所取代:第一个是一组固定的过滤器(代表整个层的知识空间,且不会改变),接着是一层一维过滤器(代表在这个空间中所学的知识)。对于预训练网络,固定层只是原始滤波器的截断特征分解。一维滤波器被初始化为线性组合的权值,但经过微调,可以恢复截断造成的任何性能损失。对于从头开始的训练网络,我们使用一组随机正交固定滤波器(永不改变),并直接从标记的数据中学习一维权重向量。我们的方法在很大程度上减少了i)培训期间可学习参数的数量,以及ii)实施期间乘法运算和筛选存储需求的数量。它不需要卷积层中的任何特殊操作符,并且扩展到所有已知的流行CNN体系结构。我们将我们的方法应用到四个由三个不同数据集训练的著名网络体系结构中。结果显示:i)操作次数最多减少了5倍,ii)可学习参数次数最多减少了18倍,在cifar100数据集上性能下降不到3%。

URL

https://arxiv.org/abs/1906.04509

PDF

https://arxiv.org/pdf/1906.04509.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot