Abstract
Beyond scaling base models with more data or parameters, fine-tuned adapters provide an alternative way to generate high fidelity, custom images at reduced costs. As such, adapters have been widely adopted by open-source communities, accumulating a database of over 100K adapters-most of which are highly customized with insufficient descriptions. This paper explores the problem of matching the prompt to a set of relevant adapters, built on recent work that highlight the performance gains of composing adapters. We introduce Stylus, which efficiently selects and automatically composes task-specific adapters based on a prompt's keywords. Stylus outlines a three-stage approach that first summarizes adapters with improved descriptions and embeddings, retrieves relevant adapters, and then further assembles adapters based on prompts' keywords by checking how well they fit the prompt. To evaluate Stylus, we developed StylusDocs, a curated dataset featuring 75K adapters with pre-computed adapter embeddings. In our evaluation on popular Stable Diffusion checkpoints, Stylus achieves greater CLIP-FID Pareto efficiency and is twice as preferred, with humans and multimodal models as evaluators, over the base model. See this http URL for more.
Abstract (translated)
除了通过增加数据或参数来扩展基础模型,微调的适配器提供了一种以较低成本生成高保真度、定制图像的替代方法。因此,适配器已被广泛应用于开源社区,累积了一个超过100K个适配器的数据库,其中大多数都是高度自定义且描述不足的。本文探讨了将提示与一组相关适配器相匹配的问题,这是基于最近的工作,该工作强调了组合适配器的性能提升。我们引入了Stylus,它根据提示的关键词高效地选择并自动组合任务特定的适配器。Stylus概述了一个三阶段的方法,首先总结具有更好描述和嵌入的适配器,检索相关的适配器,然后根据提示的关键词进一步组装适配器,通过检查它们是否符合提示来检查它们。为了评估Stylus,我们开发了StylusDocs,一个包含75K个预计算嵌入的适配器的 curated数据集。在我们的对流行Stable Diffusion检查点的评估中,Stylus实现了CLIP-FID Pareto效率的更大提高,是基础模型的两倍受欢迎程度,人类和多模态模型作为评估者,超过基础模型。更多内容,请访问此链接:http://www.example.com/StylusDocs。
URL
https://arxiv.org/abs/2404.18928