Paper Reading AI Learner

Multimodal Style Transfer via Graph Cuts

2019-04-09 03:23:20
Yulun Zhang, Chen Fang, Yilin Wang, Zhaowen Wang, Zhe Lin, Yun Fu, Jimei Yang

Abstract

An assumption widely used in recent neural style transfer methods is that image styles can be described by global statics of deep features like Gram or covariance matrices. Alternative approaches have represented styles by decomposing them into local pixel or neural patches. Despite the recent progress, most existing methods treat the semantic patterns of style image uniformly, resulting unpleasing results on complex styles. In this paper, we introduce a more flexible and general universal style transfer technique: multimodal style transfer (MST). MST explicitly considers the matching of semantic patterns in content and style images. Specifically, the style image features are clustered into sub-style components, which are matched with local content features under a graph cut formulation. A reconstruction network is trained to transfer each sub-style and render the final stylized result. Extensive experiments demonstrate the superior effectiveness, robustness and flexibility of MST.

Abstract (translated)

在最近的神经风格传递方法中,一个广泛使用的假设是图像风格可以用深层特征的全局静力学来描述,如格拉姆或协方差矩阵。替代方法通过将样式分解为局部像素或神经补丁来表示样式。尽管近年来取得了很大的进展,但现有的方法对风格图像的语义模式进行了统一的处理,导致了对复杂风格的不满意。本文介绍了一种更为灵活和通用的通用样式转换技术:多模式样式转换(MST)。MST显式地考虑内容和样式图像中语义模式的匹配。具体来说,样式图像特征被聚集到子样式组件中,这些组件与图形切割公式下的本地内容特征相匹配。重建网络训练转移每个子样式并呈现最终的样式化结果。大量实验证明了该方法的优越性、鲁棒性和灵活性。

URL

https://arxiv.org/abs/1904.04443

PDF

https://arxiv.org/pdf/1904.04443.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot