Abstract
Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness.
Abstract (translated)
深度神经网络在计算机视觉和机器学习方面取得了令人瞩目的成果。不幸的是,最先进的网络非常需要计算和内存,这使得它们不适合于诸如物联网终端节点之类的mw设备。这些网络的积极量化大大减少了计算和内存占用。二元权重神经网络(BWN)遵循这一趋势,将权重量化推向极限。目前为止,BWN的硬件加速器一直专注于核心效率,而忽略了I/O带宽和系统级效率,这对于超低功耗设备中加速器的部署至关重要。我们提出了超驱动:一种BWN加速器,它利用一种新的二进制权值流方法显著地降低了I/O带宽,这种方法可以用于任意大小的卷积神经网络结构和输入分辨率,通过在芯片级和系统级利用计算单元的自然可扩展性,安排炒作。在二维网格中系统地驱动芯片,同时并行处理整个特征图。HyperDrive实现了4.3顶级/s/w系统级效率(即,包括I/O),比最先进的BWN加速器高出3.1X,即使其核心使用了资源密集型的FP16算法来提高鲁棒性。
URL
https://arxiv.org/abs/1804.00623