And the Bit Goes Down: Revisiting the Quantization of Neural Networks

2019-07-12 11:52:54

Pierre Stock, Armand Joulin, Rémi Gribonval, Benjamin Graham, Hervé Jégou

arXiv_CV

arXiv_CV CNN Classification Inference Quantization Reconstruction

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we address the problem of reducing the memory footprint of ResNet-like convolutional network architectures. We introduce a vector quantization method that aims at preserving the quality of the reconstruction of the network outputs and not its weights. The advantage of our approach is that it minimizes the loss reconstruction error for in-domain inputs and does not require any labelled data. We also use byte-aligned codebooks to produce compressed networks with efficient inference on CPU. We validate our approach by quantizing a high performing ResNet-50 model to a memory size of 5 MB (20x compression factor) while preserving a top-1 accuracy of 76.1% on ImageNet object classification and by compressing a Mask R-CNN with a size budget around 6 MB.

Abstract (translated)

URL

https://arxiv.org/abs/1907.05686

PDF

https://arxiv.org/pdf/1907.05686.pdf