Adaptive quantization with mixed-precision based on low-cost proxy

2024-02-27 17:36:01

Junzhe Chen, Qiao Yang, Senmao Tian, Shunli Zhang

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

It is critical to deploy complicated neural network models on hardware with limited resources. This paper proposes a novel model quantization method, named the Low-Cost Proxy-Based Adaptive Mixed-Precision Model Quantization (LCPAQ), which contains three key modules. The hardware-aware module is designed by considering the hardware limitations, while an adaptive mixed-precision quantization module is developed to evaluate the quantization sensitivity by using the Hessian matrix and Pareto frontier techniques. Integer linear programming is used to fine-tune the quantization across different layers. Then the low-cost proxy neural architecture search module efficiently explores the ideal quantization hyperparameters. Experiments on the ImageNet demonstrate that the proposed LCPAQ achieves comparable or superior quantization accuracy to existing mixed-precision models. Notably, LCPAQ achieves 1/200 of the search time compared with existing methods, which provides a shortcut in practical quantization use for resource-limited devices.

Abstract (translated)

在资源有限的情况下部署复杂的神性网络模型至关重要。本文提出了一种名为低成本代理基于适应混合精度模型量化（LCPAQ）的新模型量化方法，包含三个关键模块。硬件感知模块考虑到硬件限制，而自适应混合精度量化模块通过使用Hessian矩阵和帕累托前沿技术来评估量化敏感性。使用整数线性规划在不同的层之间微调量化。然后，低成本代理神经架构搜索模块有效地探索理想的量化超参数。在ImageNet上的实验表明，与现有混合精度模型相比，LCPAQ具有可比较或更好的量化精度。值得注意的是，LCPAQ比现有方法减少了1/200的搜索时间，为资源受限设备提供了一个实用的量化使用捷径。

URL

https://arxiv.org/abs/2402.17706

PDF

https://arxiv.org/pdf/2402.17706.pdf

Adaptive quantization with mixed-precision based on low-cost proxy

Abstract

Abstract (translated)

URL

PDF Copy

PDF