Semi-Supervised Learning of Multi-Object 3D Scene Representations

2020-10-08 14:49:23

Cathrin Elich, Martin R. Oswald, Marc Pollefeys, Jörg Stückler

arXiv_CV

arXiv_CV Pose 3D Self-Supervised

Abstract
Abstract (translated)
URL
PDF

Abstract

Representing scenes at the granularity of objects is a prerequisite for scene understanding and decision making. We propose a novel approach for learning multi-object 3D scene representations from images. A recurrent encoder regresses a latent representation of 3D shapes, poses and texture of each object from an input RGB image. The 3D shapes are represented continuously in function-space as signed distance functions (SDF) which we efficiently pre-train from example shapes in a supervised way. By differentiable rendering we then train our model to decompose scenes self-supervised from RGB-D images. Our approach learns to decompose images into the constituent objects of the scene and to infer their shape, pose and texture from a single view. We evaluate the accuracy of our model in inferring the 3D scene layout and demonstrate its generative capabilities.

Abstract (translated)

URL

https://arxiv.org/abs/2010.04030

PDF

https://arxiv.org/pdf/2010.04030.pdf