Paper Reading AI Learner

Additive Noise Annealing and Approximation Properties of Quantized Neural Networks

2019-05-24 21:30:54
Matteo Spallanzani, Lukas Cavigelli, Gian Paolo Leonardi, Marko Bertogna, Luca Benini

Abstract

We present a theoretical and experimental investigation of the quantization problem for artificial neural networks. We provide a mathematical definition of quantized neural networks and analyze their approximation capabilities, showing in particular that any Lipschitz-continuous map defined on a hypercube can be uniformly approximated by a quantized neural network. We then focus on the regularization effect of additive noise on the arguments of multi-step functions inherent to the quantization of continuous variables. In particular, when the expectation operator is applied to a non-differentiable multi-step random function, and if the underlying probability density is differentiable (in either classical or weak sense), then a differentiable function is retrieved, with explicit bounds on its Lipschitz constant. Based on these results, we propose a novel gradient-based training algorithm for quantized neural networks that generalizes the straight-through estimator, acting on noise applied to the network's parameters. We evaluate our algorithm on the CIFAR-10 and ImageNet image classification benchmarks, showing state-of-the-art performance on AlexNet and MobileNetV2 for ternary networks.

Abstract (translated)

本文对人工神经网络的量化问题进行了理论和实验研究。我们给出了量化神经网络的数学定义,并分析了它们的逼近能力,特别说明在超立方体上定义的任何Lipschitz连续映射都可以由量化神经网络统一逼近。然后重点讨论了加性噪声对连续变量量化固有的多阶函数参数的正则化影响。特别是,当期望算子应用于不可微的多步随机函数时,如果潜在概率密度是可微的(在经典或弱意义上),则检索一个可微函数,其lipschitz常数上有显式界。基于这些结果,我们提出了一种新的基于梯度的量化神经网络训练算法,该算法对直接估计量进行了推广,并对网络参数施加噪声。我们在CIFAR-10和ImageNet图像分类基准上评估了我们的算法,显示了Alexnet和MobileNetv2在三元网络上的最新性能。

URL

https://arxiv.org/abs/1905.10452

PDF

https://arxiv.org/pdf/1905.10452.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot