Abstract
Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and this http URL using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference. Our project website is available at this https URL and our code is available at this https URL.
Abstract (translated)
基于扩散的图像生成模型在生产高质量合成内容方面表现出色,但其推理过程缓慢且计算成本高昂。先前的研究试图通过缓存和重复使用扩散变换器中的特征来缓解这一问题,在不同的推断步骤中实现这一点。然而,这些方法往往依赖于刚性的启发式规则,导致加速效果有限或不同架构之间泛化性能不佳。 我们提出了一种名为“基于进化缓存的扩散模型加速”(ECAD)的遗传算法,该算法能够根据少量校准提示学习每种模型的有效缓存时间表,并形成帕累托前沿。ECAD不需要对网络参数或参考图像进行任何修改,它可以显著提高推理速度,提供对质量-延迟权衡的精细控制,并无缝适应不同的扩散模型。 值得注意的是,通过使用与校准时未见过的不同分辨率和模型变体,ECAD所学习的时间表能够有效地泛化。 我们在PixArt-alpha、PixArt-Sigma及另一组图像生成模型上评估了ECAD,采用多种指标(FID、CLIP、Image Reward)以及多个基准数据集(COCO、MJHQ-30k、PartiPrompts),结果显示在各个方面都优于先前的方法。例如,在PixArt-alpha上的测试中,ECAD找到了一种方案,其表现超越了此前的最先进方法4.47分COCO FID,并将推断速度从2.35倍提升至2.58倍。 我们的实验结果表明,ECAD是一种可扩展且泛化的加速扩散模型推理的方法。该项目网站和代码可以在相应的链接中找到。
URL
https://arxiv.org/abs/2506.15682