Paper Reading AI Learner

Bridging Remote Sensors with Multisensor Geospatial Foundation Models

2024-04-01 17:30:56
Boran Han, Shuai Zhang, Xingjian Shi, Markus Reichstein

Abstract

In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM is uniquely adept at handling both paired and unpaired sensor data. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach in masked image modeling, enabling the synthesis of joint representations from diverse sensors. msGFM, incorporating four remote sensors, upholds strong performance, forming a comprehensive model adaptable to various sensor types. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks. These include scene classification, segmentation, cloud removal, and pan-sharpening. A key discovery of our research is that representations derived from natural images are not always compatible with the distinct characteristics of geospatial remote sensors, underscoring the limitations of existing representations in this field. Our work can serve as a guide for developing multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities.

Abstract (translated)

在地理分析领域,遥感的多样性,包括光学和微波技术,提供了丰富的独特观测能力。意识到这一点,我们提出了msGFM,一个多传感器地理基础模型,有效地将四个关键传感器模态的数据统一在一起。这个集成涵盖了200,000个多传感器图像的广泛数据集。msGFM特别擅长处理成对和无对传感器数据。对于来自相同地理位置的数据,我们的模型采用了一种创新性的跨传感器预训练方法,实现从不同传感器合成联合表示。msGFM,包括四个遥感器,具有强大的性能,形成了一个适用于各种传感器类型的综合模型。msGFM在各种单传感器和多传感器下游任务中表现出了卓越的性能。这些包括场景分类、分割、云删除和锐化。我们研究的关键发现是,自然图像生成的表示并不总是与地理遥感器的独特特征相兼容,突显了该领域现有表示的局限性。我们的工作可以为开发多传感器地理预训练模型提供指导,为更先进的空间技术铺平道路。

URL

https://arxiv.org/abs/2404.01260

PDF

https://arxiv.org/pdf/2404.01260.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot