Abstract
We present a theoretical and experimental investigation of the quantization problem for artificial neural networks. We provide a mathematical definition of quantized neural networks and analyze their approximation capabilities, showing in particular that any Lipschitz-continuous map defined on a hypercube can be uniformly approximated by a quantized neural network. We then focus on the regularization effect of additive noise on the arguments of multi-step functions inherent to the quantization of continuous variables. In particular, when the expectation operator is applied to a non-differentiable multi-step random function, and if the underlying probability density is differentiable (in either classical or weak sense), then a differentiable function is retrieved, with explicit bounds on its Lipschitz constant. Based on these results, we propose a novel gradient-based training algorithm for quantized neural networks that generalizes the straight-through estimator, acting on noise applied to the network's parameters. We evaluate our algorithm on the CIFAR-10 and ImageNet image classification benchmarks, showing state-of-the-art performance on AlexNet and MobileNetV2 for ternary networks.
Abstract (translated)
本文对人工神经网络的量化问题进行了理论和实验研究。我们给出了量化神经网络的数学定义,并分析了它们的逼近能力,特别说明在超立方体上定义的任何Lipschitz连续映射都可以由量化神经网络统一逼近。然后重点讨论了加性噪声对连续变量量化固有的多阶函数参数的正则化影响。特别是,当期望算子应用于不可微的多步随机函数时,如果潜在概率密度是可微的(在经典或弱意义上),则检索一个可微函数,其lipschitz常数上有显式界。基于这些结果,我们提出了一种新的基于梯度的量化神经网络训练算法,该算法对直接估计量进行了推广,并对网络参数施加噪声。我们在CIFAR-10和ImageNet图像分类基准上评估了我们的算法,显示了Alexnet和MobileNetv2在三元网络上的最新性能。
URL
https://arxiv.org/abs/1905.10452