Paper Reading AI Learner

Fixed-point quantization aware training for on-device keyword-spotting

2023-03-04 01:06:16
Sashank Macha, Om Oza, Alex Escott, Francesco Caliva, Robbie Armitano, Santosh Kumar Cheekatmalla, Sree Hari Krishnan Parthasarathi, Yuzong Liu

Abstract

Fixed-point (FXP) inference has proven suitable for embedded devices with limited computational resources, and yet model training is continually performed in floating-point (FLP). FXP training has not been fully explored and the non-trivial conversion from FLP to FXP presents unavoidable performance drop. We propose a novel method to train and obtain FXP convolutional keyword-spotting (KWS) models. We combine our methodology with two quantization-aware-training (QAT) techniques - squashed weight distribution and absolute cosine regularization for model parameters, and propose techniques for extending QAT over transient variables, otherwise neglected by previous paradigms. Experimental results on the Google Speech Commands v2 dataset show that we can reduce model precision up to 4-bit with no loss in accuracy. Furthermore, on an in-house KWS dataset, we show that our 8-bit FXP-QAT models have a 4-6% improvement in relative false discovery rate at fixed false reject rate compared to full precision FLP models. During inference we argue that FXP-QAT eliminates q-format normalization and enables the use of low-bit accumulators while maximizing SIMD throughput to reduce user perceived latency. We demonstrate that we can reduce execution time by 68% without compromising KWS model's predictive performance or requiring model architectural changes. Our work provides novel findings that aid future research in this area and enable accurate and efficient models.

Abstract (translated)

固定点(FXP)推理已经被证明适用于具有有限计算资源嵌入设备,但模型训练仍然通常在浮点(FLP)上进行。FXP训练尚未完全探索,从FLP到FXP的的重大转换不可避免地会导致性能下降。我们提出了一种新的方法来训练和获得FXP卷积关键词定位(KWS)模型。我们结合了我们的方法和两个量化名称训练(QAT)技术——对模型参数的镇压权重分布和绝对余弦正则化,并提出了方法来扩展QAT对暂态变量,而以前的 paradigm 忽略了它们。在Google语音命令v2数据集上的实验结果显示,我们可以将模型精度降低到4位,而精度没有损失。此外,在一个我们的内部KWS数据集上,我们表明,我们的8位FXP-QAT模型在固定错误拒绝率下的相对错误发现率有4-6%的提高,与全精度FLP模型相比。在推理期间,我们指出FXP-QAT消除了q格式归约,并允许使用低位寄存器,同时最大限度地增加单指令多数据流吞吐量,以减少用户感知延迟。我们证明,我们可以减少执行时间68%,而不会牺牲KWS模型的预测性能或要求模型结构改变。我们的工作提供了新发现,有助于该领域的未来研究,并使准确高效的模型实现。

URL

https://arxiv.org/abs/2303.02284

PDF

https://arxiv.org/pdf/2303.02284.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot