Abstract
World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26.3% better mIoU metric than DOME and 23.7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code and videos will be available at this https URL.
Abstract (translated)
世界模型对于自动驾驶至关重要,它们能够模拟环境动态并生成合成数据。现有的方法在将自身车辆的运动(视角变化)与场景演变(代理交互)分离时遇到了困难,导致预测效果不佳。为此,我们提出了一种通过利用以场景为中心的坐标系统来区分环境变化和自我运动的方法。本文介绍了一个新框架COME:一种将场景中心控制融入占用世界模型中的方法。 具体来说,COME首先通过一个以场景为中心的预测分支生成与自身车辆无关、空间一致的未来特征,然后使用定制化的ControlNet将其转换为场景条件特征。这些条件特征随后被注入到占用世界模型中,从而实现更准确和可控的未来占据预测。 在nuScenes-Occ3D数据集上的实验结果表明,COME在各种配置下(包括不同的输入源[真实值、基于摄像头、融合型占位]以及不同预测时间范围[3秒和8秒])均优于现有最佳方法。例如,在相同设置下,COME的mIoU指标比DOME高26.3%,比UniScene高23.7%。 这些结果凸显了解耦表示学习在增强世界模型时空预测准确性方面的有效性。代码与视频将在以下网址发布:[提供链接](请将[提供链接]替换为实际提供的链接)。
URL
https://arxiv.org/abs/2506.13260