UWC: Unit-wise Calibration Towards Rapid Network Compression

2022-01-17 12:27:35

Chen Lin, Zheyang Li, Bo Peng, Haoji Hu, Wenming Tan, Ye Ren, Shiliang Pu

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing layer-by-layer parameters calibration. However, with lower representational ability of extremely compressed parameters (e.g., the bit-width goes less than 4), it is hard to eliminate all the layer-wise errors. This work addresses this issue via proposing a unit-wise feature reconstruction algorithm based on an observation of second order Taylor series expansion of the unit-wise error. It indicates that leveraging the interaction between adjacent layers' parameters could compensate layer-wise errors better. In this paper, we define several adjacent layers as a Basic-Unit, and present a unit-wise post-training algorithm which can minimize quantization error. This method achieves near-original accuracy on ImageNet and COCO when quantizing FP32 models to INT4 and INT3.

Abstract (translated)

URL

https://arxiv.org/abs/2201.06376

PDF

https://arxiv.org/pdf/2201.06376.pdf

UWC: Unit-wise Calibration Towards Rapid Network Compression

Abstract

Abstract (translated)

URL

PDF Copy

PDF