Abstract
Text-based style transfer is a newly-emerging research topic that uses text information instead of style image to guide the transfer process, significantly extending the application scenario of style transfer. However, previous methods require extra time for optimization or text-image paired data, leading to limited effectiveness. In this work, we achieve a data-efficient text-based style transfer method that does not require optimization at the inference stage. Specifically, we convert text input to the style space of the pre-trained VGG network to realize a more effective style swap. We also leverage CLIP's multi-modal embedding space to learn the text-to-style mapping with the image dataset only. Our method can transfer arbitrary new styles of text input in real-time and synthesize high-quality artistic images.
Abstract (translated)
文本风格迁移是一个新兴的研究话题,它使用文本信息而不是风格图像来指导转移过程,大大扩展了风格迁移的应用场景。然而,以前的方法和方法需要额外的时间进行优化或文本-图像配对数据,导致其有效性有限。在这项工作中,我们实现了一种数据高效的文本风格迁移方法,在推理阶段不需要优化。具体来说,我们将文本输入转换为预先训练的VGG网络的风格空间,以实现更有效的风格交换。我们还利用CLIP的多模态嵌入空间,仅使用图像数据学习文本-风格映射。我们的方法可以在实时地将任意新的文本风格传输到环境中,并合成高质量的艺术图像。
URL
https://arxiv.org/abs/2301.10916