Paper Reading AI Learner

BinaryDM: Towards Accurate Binarization of Diffusion Model

2024-04-08 16:46:25
Xingyu Zheng, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Xianglong Liu

Abstract

With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. In this paper, we propose BinaryDM, a novel accurate quantization-aware training approach to push the weights of diffusion models towards the limit of 1-bit. Firstly, we present a Learnable Multi-basis Binarizer (LMB) to recover the representations generated by the binarized DM, which improves the information in details of representations crucial to the DM. Secondly, a Low-rank Representation Mimicking (LRM) is applied to enhance the binarization-aware optimization of the DM, alleviating the optimization direction ambiguity caused by fine-grained alignment. Moreover, a progressive initialization strategy is applied to training DMs to avoid convergence difficulties. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. As the first binarization method for diffusion models, BinaryDM achieves impressive 16.0 times FLOPs and 27.1 times storage savings with 1-bit weight and 4-bit activation, showcasing its substantial advantages and potential for deploying DMs on resource-limited scenarios.

Abstract (translated)

随着扩散模型(DMs)的进步和计算需求的增加,量化成为获得紧凑和高效的低位位DM的实用解决方案。然而,高度离散的表示导致精确度降低,阻碍了扩散模型的量化达到超低位宽。在本文中,我们提出了BinaryDM,一种新颖的准确量化感知训练方法,将扩散模型的权重推向1位的极限。首先,我们提出了可学习的多基元二进制化器(LMB),以恢复二进制化DM生成的表示,这有助于提高对DM中关键表示信息。其次,低秩表示模拟(LRM)应用于增强DM的量化意识,减轻了微细对齐引起的优化方向不确定性。此外,我们还应用了逐步初始化策略来训练DM,以避免收敛困难。综合实验证明,BinaryDM在超低位宽的DM上实现了显著的准确性和效率提升,与当前的DM量化方法相比。作为扩散模型的第一种二进制化方法,BinaryDM在1个位宽和4个位宽的权重下,取得了令人印象深刻的16.0倍FLOPs和27.1倍存储节省,展示了其在有限资源场景下部署DM的显著优势和潜在。

URL

https://arxiv.org/abs/2404.05662

PDF

https://arxiv.org/pdf/2404.05662.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot