XNORBIN: A 95 TOp/s/W Hardware Accelerator for Binary Convolutional Neural Networks

2018-03-05 15:41:28

Andrawes Al Bahou, Geethan Karunaratne, Renzo Andri, Lukas Cavigelli, Luca Benini

arXiv_AI

arXiv_AI CNN Quantization

Abstract
Abstract (translated)
URL
PDF

Abstract

Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory. This precludes the implementation of CNNs in low-power embedded systems. Recent research shows CNNs sustain extreme quantization, binarizing their weights and intermediate feature maps, thereby saving 8-32\x memory and collapsing energy-intensive sum-of-products into XNOR-and-popcount operations. We present XNORBIN, an accelerator for binary CNNs with computation tightly coupled to memory for aggressive data reuse. Implemented in UMC 65nm technology XNORBIN achieves an energy efficiency of 95 TOp/s/W and an area efficiency of 2.0 TOp/s/MGE at 0.8 V.

Abstract (translated)

部署最先进的CNN需要耗电的处理器和片外存储器。这排除了在低功率嵌入式系统中实现CNN。最近的研究表明，CNN支持极端量化，对其权重和中间特征图进行二值化，从而节省了8-32个内存，并将能源密集型产品总和折叠成XNOR和Popcount操作。我们提出XNORBIN，一种用于二进制CNN的加速器，其计算与存储器紧密耦合，用于积极的数据重用。采用UMC 65nm技术实现XNORBIN在0.8 V时实现95 TOp / s / W的能效和2.0 TOp / s / MGE的面积效率。

URL

https://arxiv.org/abs/1803.05849

PDF

https://arxiv.org/pdf/1803.05849.pdf