Abstract
Self-play has powered breakthroughs in two-player and multi-player games. Here we show that self-play is a surprisingly effective strategy in another domain. We show that robust and naturalistic driving emerges entirely from self-play in simulation at unprecedented scale -- 1.6~billion~km of driving. This is enabled by Gigaflow, a batched simulator that can synthesize and train on 42 years of subjective driving experience per hour on a single 8-GPU node. The resulting policy achieves state-of-the-art performance on three independent autonomous driving benchmarks. The policy outperforms the prior state of the art when tested on recorded real-world scenarios, amidst human drivers, without ever seeing human data during training. The policy is realistic when assessed against human references and achieves unprecedented robustness, averaging 17.5 years of continuous driving between incidents in simulation.
Abstract (translated)
自博弈在双人游戏和多人游戏中推动了突破性进展。在这里,我们展示了自博弈在一个完全不同的领域中同样是一种极其有效的策略。我们在模拟环境中通过大规模的自博弈(达到了前所未有的规模——16亿公里驾驶里程)生成了稳健且自然主义风格的驾驶行为。这一成就得益于Gigaflow系统的支持,这是一种能够在一个单个配备8张GPU的节点上每小时合成并训练42年人类主观驾驶经验的大批量模拟器。所获得的策略在三个独立的自动驾驶基准测试中达到了最先进的性能水平。 该策略在未经任何人类数据训练的情况下,在真实世界的记录场景中和人类驾驶员混行时,依然超越了之前的最佳表现。当根据人类参考标准评估时,该政策显得非常现实,并实现了前所未有的稳健性,在模拟环境中平均每17.5年连续驾驶才会出现一次事故。
URL
https://arxiv.org/abs/2502.03349