Paper Reading AI Learner

Mask-guided Style Transfer Network for Purifying Real Images

2019-03-19 03:54:55
Tongtong Zhao, Yuxiao Yan, Jinjia Peng, Huibing Wang, Xianping Fu

Abstract

Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared with real images, the desired performance cannot be achieved. To solve this problem, the previous method learned a model to improve the realism of the synthetic images. Different from the previous methods, this paper try to purify real image by extracting discriminative and robust features to convert outdoor real images to indoor synthetic images. In this paper, we first introduce the segmentation masks to construct RGB-mask pairs as inputs, then we design a mask-guided style transfer network to learn style features separately from the attention and bkgd(background) regions and learn content features from full and attention region. Moreover, we propose a novel region-level task-guided loss to restrain the features learnt from style and content. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. We evaluate the proposed method on various public datasets, including LPW, COCO and MPIIGaze. Experimental results show that the proposed method is effective and achieves the state-of-the-art results.

Abstract (translated)

近年来,合成学习的发展提出了一种合成图像的训练模型,有效地降低了人力物力资源的成本。然而,由于合成图像与真实图像的分布不同,无法达到预期的性能。为了解决这个问题,以前的方法学习了一个模型来提高合成图像的真实性。与以往的方法不同,本文试图通过提取识别性强、鲁棒性强的特征来对真实图像进行净化,将室外真实图像转化为室内合成图像。在本文中,我们首先引入分割掩模来构造作为输入的RGB掩模对,然后设计了一个掩模引导的风格转换网络,分别从注意和背景区域学习风格特征,从完全和注意区域学习内容特征。此外,我们还提出了一种新的区域级任务引导损失,以抑制从风格和内容中学到的特征。实验采用混合研究(定性和定量)方法,以证明在复杂方向上净化真实图像的可能性。我们对各种公共数据集(包括lpw、coco和mpigaze)的方法进行了评估。实验结果表明,该方法是有效的,达到了最新的结果。

URL

https://arxiv.org/abs/1903.08152

PDF

https://arxiv.org/pdf/1903.08152.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot