Abstract
To improve the throughput and energy efficiency of Deep Neural Networks (DNNs) on customized hardware, lightweight neural networks constrain the weights of DNNs to be a limited combination (denoted as $k\in\{1,2\}$) of powers of 2. In such networks, the multiply-accumulate operation can be replaced with a single shift operation, or two shifts and an add operation. To provide even more design flexibility, the $k$ for each convolutional filter can be optimally chosen instead of being fixed for every filter. In this paper, we formulate the selection of $k$ to be differentiable, and describe model training for determining $k$-based weights on a per-filter basis. Over 46 FPGA-design experiments involving eight configurations and four data sets reveal that lightweight neural networks with a flexible $k$ value (dubbed FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNNs can achieve 2$\times$ speedup when compared to lightweight NNs with $k=2$, with only 0.1\% accuracy degradation. Compared to a 4-bit fixed-point quantization, FLightNNs achieve higher accuracy and up to 2$\times$ inference speedup, due to their lightweight shift operations. In addition, our experiments also demonstrate that FLightNNs can achieve higher computational energy efficiency for ASIC implementation.
Abstract (translated)
为了提高定制硬件上的深度神经网络(DNN)的吞吐量和能量效率,轻量级神经网络将DNN的权重限制为2的幂的有限组合(表示为1,2的k)。在这种网络中,乘法累加运算可以用一个移位运算或两个移位和一个加法运算代替。为了提供更大的设计灵活性,每个卷积滤波器的$K$可以被优化选择,而不是为每个滤波器固定。在本文中,我们制定了$K$的可微选择,并描述了基于每个过滤器的$K$权重确定的模型训练。超过46个包含8个配置和4个数据集的FPGA设计实验表明,具有灵活k$值(称为flightnns)的轻量级神经网络充分利用了现场可编程门阵列(fpgas)上的硬件资源,我们的实验结果表明,与lightweight相比,flightnns可以实现2$倍的加速t nns,k=2美元,精度仅降低0.1%。与4位定点量化相比,Flightnns具有更高的精度,并且由于其轻量级的移位操作,因此推理速度提高了2$倍。此外,我们的实验也证明了Flightnns可以为ASIC实现提供更高的计算能量效率。
URL
https://arxiv.org/abs/1904.02835