Abstract
Image stylization involves manipulating the visual appearance and texture (style) of an image while preserving its underlying objects, structures, and concepts (content). The separation of style and content is essential for manipulating the image's style independently from its content, ensuring a harmonious and visually pleasing result. Achieving this separation requires a deep understanding of both the visual and semantic characteristics of images, often necessitating the training of specialized models or employing heavy optimization. In this paper, we introduce B-LoRA, a method that leverages LoRA (Low-Rank Adaptation) to implicitly separate the style and content components of a single image, facilitating various image stylization tasks. By analyzing the architecture of SDXL combined with LoRA, we find that jointly learning the LoRA weights of two specific blocks (referred to as B-LoRAs) achieves style-content separation that cannot be achieved by training each B-LoRA independently. Consolidating the training into only two blocks and separating style and content allows for significantly improving style manipulation and overcoming overfitting issues often associated with model fine-tuning. Once trained, the two B-LoRAs can be used as independent components to allow various image stylization tasks, including image style transfer, text-based image stylization, consistent style generation, and style-content mixing.
Abstract (translated)
图像风格化涉及对图像的视觉外观和质感(风格)进行操作,同时保留其潜在的对象、结构和概念(内容)。风格与内容的分离对于独立于内容操作图像的风格至关重要,确保了和谐和视觉愉悦的结果。实现这种分离需要对图像的视觉和语义特征有深入的理解,通常需要通过训练专用模型或使用强大的优化来实现。在本文中,我们介绍了B-LoRA,一种利用LoRA(低秩适应)方法隐含地分离单个图像的样式和内容组件的方法,从而轻松完成各种图像风格化任务。通过分析SDXL与LoRA的架构,我们发现,共同学习两个特定模块(被称为B-LoRAs)的LoRA权重确实实现了样式与内容的分离,而通过独立训练每个B-LoRA,无法实现这种样式与内容的分离。将训练合并为两个模块并分离样式和内容,可以大大改善样式操作,克服通常与模型微调相关的过拟合问题。经过训练后,两个B-LoRAs可以作为独立的组件用于各种图像风格化任务,包括图像风格转移、基于文本的图像风格化、一致风格生成和样式与内容的混合。
URL
https://arxiv.org/abs/2403.14572