Paper Reading AI Learner

Accelerating Image Generation with Sub-path Linear Approximation Model

2024-04-22 06:25:17
Chen Xu, Tianhui Song, Weixin Feng, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

Abstract

Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs allows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step generation tasks, achieving state-of-the-art performance both on FID and the quality of the generated images.

Abstract (translated)

扩散模型在图像、音频和视频生成任务方面显著提高了先进水平。然而,在实际场景中,它们的推理速度较慢,从而限制了其应用。从一致性模型中使用的逼近策略中汲取灵感,我们提出了Sub-path Linear Approximation Model(SLAM),它通过保持高质图像生成的同时加速扩散模型而得到了发展。SLAM将PF-ODE轨迹视为一系列通过采样的点分隔的PF-ODE子路径,并利用子路径线性(SL) ODE形成每个PF-ODE子路径的渐进和连续误差估计。在SL-ODE上进行优化允许SLAM构建具有较小累积近似误差的去噪映射。还开发了一种有效的去雾方法,以促进更复杂的扩散模型的引入,例如潜在扩散模型。我们的广泛实验结果表明,SLAM实现了高效的训练方法,只需6个A100 GPU天的时间就能生产出具有2到4步生成能力的高质量生成模型,具有出色的性能。对LAION、MS COCO 2014和MS COCO 2017数据集的全面评估还证明了SLAM在几步生成任务中超越了现有加速方法,同时在FID和生成图像的质量方面实现了最先进的性能。

URL

https://arxiv.org/abs/2404.13903

PDF

https://arxiv.org/pdf/2404.13903.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot