Abstract
Neural network accelerators with low latency and low energy consumption are desirable for edge computing. To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes. This flow covers both network training and FPGA-based network deployment, which facilitates the design space exploration and simplifies the tradeoff between network accuracy and computation efficiency. Using this flow helps hardware designers to deliver a network accelerator in edge devices under strict resource and power constraints. We present the proposed flow by supporting hybrid ELB settings within a neural network. Results show that our design can deliver very high performance peaking at 10.3 TOPS and classify up to 325.3 image/s/watt while running large-scale neural networks for less than 5W using embedded FPGA. To the best of our knowledge, it is the most energy efficient solution in comparison to GPU or other FPGA implementations reported so far in the literature.
Abstract (translated)
具有低延迟和低能耗的神经网络加速器是边缘计算所需的。为了创建这样的加速器,我们提出了一种设计流程,用于通过混合量化方案加速嵌入式FPGA中的极低位宽神经网络(ELB-NN)。该流程涵盖网络培训和基于FPGA的网络部署,有利于设计空间探索,简化网络精度和计算效率之间的权衡。使用此流程有助于硬件设计人员在严格的资源和功率限制下在边缘设备中提供网络加速器。我们通过在神经网络中支持混合ELB设置来呈现所提出的流程。结果表明,我们的设计可以在10.3 TOPS下提供非常高的性能峰值,并且使用嵌入式FPGA运行大于5W的大规模神经网络时可以分类高达325.3图像/ s /瓦。据我们所知,与目前文献报道的GPU或其他FPGA实现相比,它是最节能的解决方案。
URL
https://arxiv.org/abs/1808.04311