Paper Reading AI Learner

Auto-Encoding Progressive Generative Adversarial Networks For 3D Multi Object Scenes

2019-03-08 14:53:16
Vedant Singh, Manan Oza, Himanshu Vaghela, Pratik Kanani

Abstract

3D multi object generative models allow us to synthesize a large range of novel 3D multi object scenes and also identify objects, shapes, layouts and their positions. But multi object scenes are difficult to create because of the dataset being multimodal in nature. The conventional 3D generative adversarial models are not efficient in generating multi object scenes, they usually tend to generate either one object or generate fuzzy results of multiple objects. Auto-encoder models have much scope in feature extraction and representation learning using the unsupervised paradigm in probabilistic spaces. We try to make use of this property in our proposed model. In this paper we propose a novel architecture using 3DConvNets trained with the progressive training paradigm that has been able to generate realistic high resolution 3D scenes of rooms, bedrooms, offices etc. with various pieces of furniture and objects. We make use of the adversarial auto-encoder along with the WGAN-GP loss parameter in our discriminator loss function. Finally this new approach to multi object scene generation has also been able to generate more number of objects per scene.

Abstract (translated)

三维多对象生成模型使我们能够合成大量新颖的三维多对象场景,并识别对象、形状、布局及其位置。但是,由于数据集本质上是多模式的,因此很难创建多对象场景。传统的三维生成对抗模型不能有效地生成多目标场景,通常倾向于生成一个对象或多个对象的模糊结果。在概率空间中,利用无监督范式,自动编码模型在特征提取和表示学习方面具有很大的应用范围。我们试图在我们提出的模型中利用这一特性。本文提出了一种以渐进式训练范式训练的3DConnets的新体系结构,该体系结构能够生成具有各种家具和物体的房间、卧室、办公室等真实的高分辨率三维场景。我们在鉴别器损耗功能中使用了敌方自动编码器和wgan-gp损耗参数。最后,这种新的多对象场景生成方法也能够在每个场景中生成更多的对象。

URL

https://arxiv.org/abs/1903.03477

PDF

https://arxiv.org/pdf/1903.03477.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot