Abstract
Post-training quantization (PTQ) is widely regarded as one of the most efficient compression methods practically, benefitting from its data privacy and low computation costs. We argue that an overlooked problem of oscillation is in the PTQ methods. In this paper, we take the initiative to explore and present a theoretical proof to explain why such a problem is essential in PTQ. And then, we try to solve this problem by introducing a principled and generalized framework theoretically. In particular, we first formulate the oscillation in PTQ and prove the problem is caused by the difference in module capacity. To this end, we define the module capacity (ModCap) under data-dependent and data-free scenarios, where the differentials between adjacent modules are used to measure the degree of oscillation. The problem is then solved by selecting top-k differentials, in which the corresponding modules are jointly optimized and quantized. Extensive experiments demonstrate that our method successfully reduces the performance drop and is generalized to different neural networks and PTQ methods. For example, with 2/4 bit ResNet-50 quantization, our method surpasses the previous state-of-the-art method by 1.9%. It becomes more significant on small model quantization, e.g. surpasses BRECQ method by 6.61% on MobileNetV2*0.5.
Abstract (translated)
Post-training quantization (PTQ)被广泛认为是在实践中最高效的压缩方法之一,因为它的数据隐私和低计算成本带来了优势。我们认为,PTQ方法中存在的一个被忽略的问题就是振荡。在本文中,我们采取了主动措施,探索并提出了理论证明,以解释为什么这样的问题在PTQ方法中至关重要。然后,我们试图通过引入一种原则性和普遍化的框架来解决这个问题。特别是,我们首先制定了PTQ方法中的振荡,并证明了这个问题是由模块能力差异引起的。为此,我们定义了在数据依赖和数据无关的情况下的模块能力(ModCap),并在其中相邻模块之间的差异用于衡量振荡程度。解决这个问题的方法是选择 top-k 差异,其中相应的模块一起优化和量化。广泛的实验表明,我们的方法成功地减少了性能下降,并可以应用于不同的神经网络和PTQ方法。例如,使用2/4位ResNet-50的量化方法,我们的方法和之前的最先进的方法相比提高了1.9%。在小型模型量化方面,它变得更加显著,例如在MobileNetV2*0.5中比BRECQ方法提高了6.61%。
URL
https://arxiv.org/abs/2303.11906