Paper Reading AI Learner

Teaching GANs to Sketch in Vector Format

2019-04-07 10:23:47
Varshaneya V, S Balasubramanian, Vineeth N Balasubramanian

Abstract

Sketching is more fundamental to human cognition than speech. Deep Neural Networks (DNNs) have achieved the state-of-the-art in speech-related tasks but have not made significant development in generating stroke-based sketches a.k.a sketches in vector format. Though there are Variational Auto Encoders (VAEs) for generating sketches in vector format, there is no Generative Adversarial Network (GAN) architecture for the same. In this paper, we propose a standalone GAN architecture SkeGAN and a VAE-GAN architecture VASkeGAN, for sketch generation in vector format. SkeGAN is a stochastic policy in Reinforcement Learning (RL), capable of generating both multidimensional continuous and discrete outputs. VASkeGAN hybridizes a VAE and a GAN, in order to couple the efficient representation of data by VAE with the powerful generating capabilities of a GAN, to produce visually appealing sketches. We also propose a new metric called the Ske-score which quantifies the quality of vector sketches. We have validated that SkeGAN and VASkeGAN generate visually appealing sketches by using Human Turing Test and Ske-score.

Abstract (translated)

素描比言语对人类认知更为重要。深度神经网络(DNN)在语言相关任务方面已经达到了最先进的水平,但在以矢量格式生成基于中风的草图A.K.A方面还没有取得重大进展。虽然有变分自动编码器(vaes)生成矢量格式的草图,但没有生成对抗网络(gan)架构。在本文中,我们提出了一个独立的gan架构skegan和一个vae-gan架构vaskegan,用于矢量格式的草图生成。skegan是强化学习(rl)中的一种随机策略,能够产生多维连续和离散的输出。Vaskegan混合了一个VAE和一个GAN,以便将VAE对数据的有效表示与一个GAN强大的生成能力结合起来,以生成具有视觉吸引力的草图。我们还提出了一个新的指标,称为SKE分数,量化矢量草图的质量。我们已经验证了skegan和vaskegan通过使用人类图灵测试和ske评分生成视觉上吸引人的草图。

URL

https://arxiv.org/abs/1904.03620

PDF

https://arxiv.org/pdf/1904.03620.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot