Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Abstract
Abstract (translated)
URL
PDF

Abstract

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness.

Abstract (translated)

深度神经网络在计算机视觉和机器学习方面取得了令人瞩目的成果。不幸的是，最先进的网络非常需要计算和内存，这使得它们不适合于诸如物联网终端节点之类的mw设备。这些网络的积极量化大大减少了计算和内存占用。二元权重神经网络（BWN）遵循这一趋势，将权重量化推向极限。目前为止，BWN的硬件加速器一直专注于核心效率，而忽略了I/O带宽和系统级效率，这对于超低功耗设备中加速器的部署至关重要。我们提出了超驱动：一种BWN加速器，它利用一种新的二进制权值流方法显著地降低了I/O带宽，这种方法可以用于任意大小的卷积神经网络结构和输入分辨率，通过在芯片级和系统级利用计算单元的自然可扩展性，安排炒作。在二维网格中系统地驱动芯片，同时并行处理整个特征图。HyperDrive实现了4.3顶级/s/w系统级效率（即，包括I/O），比最先进的BWN加速器高出3.1X，即使其核心使用了资源密集型的FP16算法来提高鲁棒性。

URL

https://arxiv.org/abs/1804.00623

PDF

https://arxiv.org/pdf/1804.00623.pdf