TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational demands inherent to these models present challenges for deployment, quantization is thus widely used to lower the bit-width for reducing the storage and computing overheads. Current quantization methodologies primarily focus on model-side optimization, disregarding the temporal dimension, such as the length of the timestep sequence, thereby allowing redundant timesteps to continue consuming computational resources, leaving substantial scope for accelerating the generative process. In this paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and quantization to achieve a superior performance-efficiency trade-off, addressing both temporal and model optimization aspects. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process, thereby mitigating the explosive combinations of timesteps. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance, thus rectifying performance degradation observed in prior studies. To expedite the evaluation of fine-grained quantization, we further devise a super-network to serve as a precision solver by leveraging shared quantization results. These two design components are seamlessly integrated within our framework, enabling rapid joint exploration of the exponentially large decision space via a gradient-free evolutionary search algorithm.

Abstract (translated)

扩散模型已成为生成模型领域的主要竞争者。通过其独特的序列生成过程脱颖而出,这些过程具有数百甚至数千个时步,扩散模型从纯高斯噪声中逐渐重构图像,每个时步都需要对整个模型进行完整的推理。然而,这些模型固有的计算需求在面对部署方面具有挑战性,因此广泛使用量化来降低位宽以减少存储和计算开销。目前,量化方法主要关注模型侧优化,而忽略了时域维度,例如时步序列的长度,从而允许冗余时步继续消耗计算资源,为加速生成过程留下了广阔的余地。在本文中,我们引入了TMPQ-DM,该模型通过共同优化时步减少和量化来实现卓越的性能-效率权衡,解决了时域和模型优化方面的问题。在时步减少方面,我们设计了一个非均匀分组方案,针对去噪过程非均匀性的特点,从而减轻了时步的爆炸组合。在量化方面,我们采用了一种细粒度的层-wise方法,根据各个层对最终生成性能的贡献分配不同的位宽,从而纠正了之前研究中观察到的性能下降。为了加速细粒度量化评估,我们进一步设计了一个超网络,利用共享量化结果作为精度求解器。这两个设计组件无缝地整合在我们的框架中,通过梯度free进化搜索算法快速探索具有指数级大决策空间。

URL

https://arxiv.org/abs/2404.09532

PDF

https://arxiv.org/pdf/2404.09532.pdf

TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

Abstract

Abstract (translated)

URL

PDF Copy

PDF