Learning Canonical Representations for Scene Graph to Image Generation

2019-12-16 14:39:45

Roei Herzig, Amir Bar, Huijuan Xu, Gal Chechik, Trevor Darrell, Amir Globerson

arXiv_CV

arXiv_CV Prediction Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Generating realistic images of complex visual scenes becomes very challenging when one wishes to control the structure of the generated images. Previous approaches showed that scenes with few entities can be controlled using scene graphs, but this approach struggles as the complexity of the graph (number of objects and edges) increases. Moreover, current approaches fail to generalize conditioned on the number of objects or when given different input graphs which are semantic equivalent. In this work, we propose a novel approach to mitigate these issues. We present a novel model which can inherently learn canonical graph representations, thus ensuring that semantically similar scene graphs will result in similar predictions. In addition, the proposed model can better capture object representation independently of the number of objects in the graph. We show improved performance of the model on three different benchmarks: Visual Genome, COCO and CLEVR.

Abstract (translated)

URL

https://arxiv.org/abs/1912.07414

PDF

https://arxiv.org/pdf/1912.07414.pdf