Abstract
Arbitrary style transfer holds widespread attention in research and boasts numerous practical applications. The existing methods, which either employ cross-attention to incorporate deep style attributes into content attributes or use adaptive normalization to adjust content features, fail to generate high-quality stylized images. In this paper, we introduce an innovative technique to improve the quality of stylized images. Firstly, we propose Style Consistency Instance Normalization (SCIN), a method to refine the alignment between content and style features. In addition, we have developed an Instance-based Contrastive Learning (ICL) approach designed to understand the relationships among various styles, thereby enhancing the quality of the resulting stylized images. Recognizing that VGG networks are more adept at extracting classification features and need to be better suited for capturing style features, we have also introduced the Perception Encoder (PE) to capture style features. Extensive experiments demonstrate that our proposed method generates high-quality stylized images and effectively prevents artifacts compared with the existing state-of-the-art methods.
Abstract (translated)
任意风格迁移在研究和实践中具有广泛的关注,并拥有许多实际应用。现有方法中,e要么采用跨注意来将深度风格属性融入内容属性,要么使用自适应归一化来调整内容特征,都无法生成高质量的风格化图像。在本文中,我们提出了改进风格化图像质量的新技术。首先,我们提出了Style Consistency Instance Normalization(SCIN)方法,这是一种优化内容与风格特征之间对齐的方法。此外,我们还开发了一种基于实例的对比学习(ICL)方法,旨在理解各种风格之间的关系,从而提高生成风格化图像的质量。认识到VGG网络更擅长提取分类特征,需要更好地适应捕捉风格特征,我们还引入了感知编码器(PE)以捕捉风格特征。大量实验证明,与现有最先进的方法相比,我们提出的方法生成的风格化图像质量高,并有效防止了伪影。
URL
https://arxiv.org/abs/2404.13584