Abstract
The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.
Abstract (translated)
近年来,生成式基础模型的进步为我们带来了自然图像生成领域的一个新时代,推动了艺术设计、娱乐、环境模拟等领域的发展。尽管产生了高质量的样本,但现有的方法局限于在有限尺度下生成图像。在本文中,我们提出了MetaEarth,一种生成式基础模型,通过将图像生成扩展到全球范围,解除了生成图像的限制,探索了全球多分辨率、无界、几乎无限远程感测图像的创建。在MetaEarth中,我们提出了一个分辨率指导的自递归生成框架,使得在广泛的地理分辨率下生成图像成为可能。为了实现无界和任意大小的图像生成,我们通过分析生成条件和初始噪声,为去噪扩散模型设计了一种新颖的噪声抽样策略。为了训练MetaEarth,我们构建了一个由多分辨率光学遥感图像组成的较大数据集,包含了地理信息。实验证明了我们的方法在生成全球规模图像方面的强大能力。此外,MetaEarth还作为数据引擎,为下游任务提供高质量和丰富的训练数据。从创新的角度模拟地球视觉效果,我们的模型为构建生成式世界模型提供了新的可能性。
URL
https://arxiv.org/abs/2405.13570