Abstract
An assumption widely used in recent neural style transfer methods is that image styles can be described by global statics of deep features like Gram or covariance matrices. Alternative approaches have represented styles by decomposing them into local pixel or neural patches. Despite the recent progress, most existing methods treat the semantic patterns of style image uniformly, resulting unpleasing results on complex styles. In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). MST explicitly considers the matching of semantic patterns in content and style images. Specifically, the style image features are clustered into sub-style components, which are matched with local content features under a graph cut formulation. A reconstruction network is trained to transfer each sub-style and render the final stylized result. Extensive experiments demonstrate the superior effectiveness, robustness and flexibility of MST.
Abstract (translated)
在最近的神经风格传递方法中,一个广泛使用的假设是图像风格可以用深层特征的全局静力学来描述,如格拉姆或协方差矩阵。替代方法通过将样式分解为局部像素或神经补丁来表示样式。尽管近年来取得了很大的进展,但现有的方法对风格图像的语义模式进行了统一的处理,导致了对复杂风格的不满意。本文介绍了一种更为灵活和通用的通用样式转换技术:多模式样式转换(MST)。MST显式地考虑内容和样式图像中语义模式的匹配。具体来说,样式图像特征被聚集到子样式组件中,这些组件与图形切割公式下的本地内容特征相匹配。重建网络训练转移每个子样式并呈现最终的样式化结果。大量实验证明了该方法的优越性、鲁棒性和灵活性。
URL
https://arxiv.org/abs/1904.04443