Abstract
In the realm of geospatial analysis, the diversity of remote sensors, encompassing both optical and microwave technologies, offers a wealth of distinct observational capabilities. Recognizing this, we present msGFM, a multisensor geospatial foundation model that effectively unifies data from four key sensor modalities. This integration spans an expansive dataset of two million multisensor images. msGFM is uniquely adept at handling both paired and unpaired sensor data. For data originating from identical geolocations, our model employs an innovative cross-sensor pretraining approach in masked image modeling, enabling the synthesis of joint representations from diverse sensors. msGFM, incorporating four remote sensors, upholds strong performance, forming a comprehensive model adaptable to various sensor types. msGFM has demonstrated enhanced proficiency in a range of both single-sensor and multisensor downstream tasks. These include scene classification, segmentation, cloud removal, and pan-sharpening. A key discovery of our research is that representations derived from natural images are not always compatible with the distinct characteristics of geospatial remote sensors, underscoring the limitations of existing representations in this field. Our work can serve as a guide for developing multisensor geospatial pretraining models, paving the way for more advanced geospatial capabilities.
Abstract (translated)
在地理分析领域,遥感的多样性,包括光学和微波技术,提供了丰富的独特观测能力。意识到这一点,我们提出了msGFM,一个多传感器地理基础模型,有效地将四个关键传感器模态的数据统一在一起。这个集成涵盖了200,000个多传感器图像的广泛数据集。msGFM特别擅长处理成对和无对传感器数据。对于来自相同地理位置的数据,我们的模型采用了一种创新性的跨传感器预训练方法,实现从不同传感器合成联合表示。msGFM,包括四个遥感器,具有强大的性能,形成了一个适用于各种传感器类型的综合模型。msGFM在各种单传感器和多传感器下游任务中表现出了卓越的性能。这些包括场景分类、分割、云删除和锐化。我们研究的关键发现是,自然图像生成的表示并不总是与地理遥感器的独特特征相兼容,突显了该领域现有表示的局限性。我们的工作可以为开发多传感器地理预训练模型提供指导,为更先进的空间技术铺平道路。
URL
https://arxiv.org/abs/2404.01260