Paper Reading AI Learner

Manipulating Attributes of Natural Scenes via Hallucination

2018-08-22 16:01:18
Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem

Abstract

In this study, we explore building a two-stage framework for enabling users to directly manipulate high-level attributes of a natural scene. The key to our approach is a deep generative network which can hallucinate images of a scene as if they were taken at a different season (e.g. during winter), weather condition (e.g. in a cloudy day) or time of the day (e.g. at sunset). Once the scene is hallucinated with the given attributes, the corresponding look is then transferred to the input image while preserving the semantic details intact, giving a photo-realistic manipulation result. As the proposed framework hallucinates what the scene will look like, it does not require any reference style image as commonly utilized in most of the appearance or style transfer approaches. Moreover, it allows to simultaneously manipulate a given scene according to a diverse set of transient attributes within a single model, eliminating the need of training multiple networks per each translation task. Our comprehensive set of qualitative and quantitative results demonstrate the effectiveness of our approach against the competing methods.

Abstract (translated)

在本研究中,我们探索构建一个两阶段框架,使用户能够直接操作自然场景的高级属性。我们的方法的关键是一个深层的生成网络,它可以幻觉一个场景的图像,好像它们是在不同的季节(例如在冬季),天气条件(例如在阴天)或一天中的时间(例如在日落时)拍摄的)。一旦场景被给定属性产生幻觉,则相应的外观然后被转移到输入图像,同时保持语义细节完整,给出照片逼真的操纵结果。由于所提出的框架使场景看起来像幻觉,它不需要在大多数外观或样式转移方法中常用的任何参考样式图像。此外,它允许根据单个模型内的各种瞬态属性同时操纵给定场景,从而消除了每个翻译任务训练多个网络的需要。我们全面的定性和定量结果证明了我们的方法对竞争方法的有效性。

URL

https://arxiv.org/abs/1808.07413

PDF

https://arxiv.org/pdf/1808.07413.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot