Paper Reading AI Learner

MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

2024-05-22 12:07:47
Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou

Abstract

The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective.

Abstract (translated)

近年来,生成式基础模型的进步为我们带来了自然图像生成领域的一个新时代,推动了艺术设计、娱乐、环境模拟等领域的发展。尽管产生了高质量的样本,但现有的方法局限于在有限尺度下生成图像。在本文中,我们提出了MetaEarth,一种生成式基础模型,通过将图像生成扩展到全球范围,解除了生成图像的限制,探索了全球多分辨率、无界、几乎无限远程感测图像的创建。在MetaEarth中,我们提出了一个分辨率指导的自递归生成框架,使得在广泛的地理分辨率下生成图像成为可能。为了实现无界和任意大小的图像生成,我们通过分析生成条件和初始噪声,为去噪扩散模型设计了一种新颖的噪声抽样策略。为了训练MetaEarth,我们构建了一个由多分辨率光学遥感图像组成的较大数据集,包含了地理信息。实验证明了我们的方法在生成全球规模图像方面的强大能力。此外,MetaEarth还作为数据引擎,为下游任务提供高质量和丰富的训练数据。从创新的角度模拟地球视觉效果,我们的模型为构建生成式世界模型提供了新的可能性。

URL

https://arxiv.org/abs/2405.13570

PDF

https://arxiv.org/pdf/2405.13570.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot