Paper Reading AI Learner

FLightNNs: Lightweight Quantized Deep Neural Networks for Fast and Accurate Inference

2019-04-05 00:27:16
Ruizhou Ding, Zeye Liu, Ting-Wu Chin, Diana Marculescu, R. D. (Shawn) Blanton

Abstract

To improve the throughput and energy efficiency of Deep Neural Networks (DNNs) on customized hardware, lightweight neural networks constrain the weights of DNNs to be a limited combination (denoted as $k\in\{1,2\}$) of powers of 2. In such networks, the multiply-accumulate operation can be replaced with a single shift operation, or two shifts and an add operation. To provide even more design flexibility, the $k$ for each convolutional filter can be optimally chosen instead of being fixed for every filter. In this paper, we formulate the selection of $k$ to be differentiable, and describe model training for determining $k$-based weights on a per-filter basis. Over 46 FPGA-design experiments involving eight configurations and four data sets reveal that lightweight neural networks with a flexible $k$ value (dubbed FLightNNs) fully utilize the hardware resources on Field Programmable Gate Arrays (FPGAs), our experimental results show that FLightNNs can achieve 2$\times$ speedup when compared to lightweight NNs with $k=2$, with only 0.1\% accuracy degradation. Compared to a 4-bit fixed-point quantization, FLightNNs achieve higher accuracy and up to 2$\times$ inference speedup, due to their lightweight shift operations. In addition, our experiments also demonstrate that FLightNNs can achieve higher computational energy efficiency for ASIC implementation.

Abstract (translated)

为了提高定制硬件上的深度神经网络(DNN)的吞吐量和能量效率,轻量级神经网络将DNN的权重限制为2的幂的有限组合(表示为1,2的k)。在这种网络中,乘法累加运算可以用一个移位运算或两个移位和一个加法运算代替。为了提供更大的设计灵活性,每个卷积滤波器的$K$可以被优化选择,而不是为每个滤波器固定。在本文中,我们制定了$K$的可微选择,并描述了基于每个过滤器的$K$权重确定的模型训练。超过46个包含8个配置和4个数据集的FPGA设计实验表明,具有灵活k$值(称为flightnns)的轻量级神经网络充分利用了现场可编程门阵列(fpgas)上的硬件资源,我们的实验结果表明,与lightweight相比,flightnns可以实现2$倍的加速t nns,k=2美元,精度仅降低0.1%。与4位定点量化相比,Flightnns具有更高的精度,并且由于其轻量级的移位操作,因此推理速度提高了2$倍。此外,我们的实验也证明了Flightnns可以为ASIC实现提供更高的计算能量效率。

URL

https://arxiv.org/abs/1904.02835

PDF

https://arxiv.org/pdf/1904.02835.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot