Paper Reading AI Learner

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

2019-03-14 11:15:37
Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

Abstract

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness.

Abstract (translated)

深度神经网络在计算机视觉和机器学习方面取得了令人瞩目的成果。不幸的是,最先进的网络非常需要计算和内存,这使得它们不适合于诸如物联网终端节点之类的mw设备。这些网络的积极量化大大减少了计算和内存占用。二元权重神经网络(BWN)遵循这一趋势,将权重量化推向极限。目前为止,BWN的硬件加速器一直专注于核心效率,而忽略了I/O带宽和系统级效率,这对于超低功耗设备中加速器的部署至关重要。我们提出了超驱动:一种BWN加速器,它利用一种新的二进制权值流方法显著地降低了I/O带宽,这种方法可以用于任意大小的卷积神经网络结构和输入分辨率,通过在芯片级和系统级利用计算单元的自然可扩展性,安排炒作。在二维网格中系统地驱动芯片,同时并行处理整个特征图。HyperDrive实现了4.3顶级/s/w系统级效率(即,包括I/O),比最先进的BWN加速器高出3.1X,即使其核心使用了资源密集型的FP16算法来提高鲁棒性。

URL

https://arxiv.org/abs/1804.00623

PDF

https://arxiv.org/pdf/1804.00623.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot