DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics

Abstract
Abstract (translated)
URL
PDF

Abstract

We introduce the first work to explore web-scale diffusion models for robotics. DALL-E-Bot enables a robot to rearrange objects in a scene, by first inferring a text description of those objects, then generating an image representing a natural, human-like arrangement of those objects, and finally physically arranging the objects according to that image. The significance is that we achieve this zero-shot using DALL-E, without needing any further data collection or training. Encouraging real-world results with human studies show that this is an exciting direction for the future of web-scale robot learning algorithms. We also propose a list of recommendations to the text-to-image community, to align further developments of these models with applications to robotics. Videos are available at: this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2210.02438

PDF

https://arxiv.org/pdf/2210.02438.pdf