Abstract
Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their expressiveness and may introduce bias into the cloned policy. We begin by pointing out the limitations of these choices. We then propose that diffusion models are an excellent fit for imitating human behaviour, since they learn an expressive distribution over the joint action space. We introduce several innovations to make diffusion models suitable for sequential environments; designing suitable architectures, investigating the role of guidance, and developing reliable sampling strategies. Experimentally, diffusion models closely match human demonstrations in a simulated robotic control task and a modern 3D gaming environment.
Abstract (translated)
扩散模型在文本到图像领域成为了强大的生成模型,可用于模拟人类行为在Sequential环境中观察到的行动。人类行为具有随机和多模式特点,在行动维度之间存在结构化相关性。同时,行为克隆的标准建模选择具有表达的局限性,可能会影响克隆政策。我们首先指出这些选择的局限性。然后提出扩散模型非常适合模拟人类行为,因为它们在学习共同行动空间的表达分布。我们引入了几个创新来使扩散模型适用于Sequential环境,设计合适的架构,研究指导的作用,并开发可靠的采样策略。实验表明,扩散模型在模拟机器人控制任务和现代3D游戏环境中与人的行为演示非常接近。
URL
https://arxiv.org/abs/2301.10677