Abstract
The field of Earth Observations (EO) offers a wealth of data from diverse sensors, presenting a great opportunity for advancing self-supervised multimodal learning. However, current multimodal EO datasets and models focus on a single data type, either mono-date images or time series, which limits their expressivity. We introduce OmniSat, a novel architecture that exploits the spatial alignment between multiple EO modalities to learn expressive multimodal representations without labels. To demonstrate the advantages of combining modalities of different natures, we augment two existing datasets with new modalities. As demonstrated on three downstream tasks: forestry, land cover classification, and crop mapping. OmniSat can learn rich representations in an unsupervised manner, leading to improved performance in the semi- and fully-supervised settings, even when only one modality is available for inference. The code and dataset are available at this http URL.
Abstract (translated)
地球观测(EO)领域提供了来自各种传感器的丰富数据,这为自监督多模态学习提供了巨大的机会。然而,当前的EO数据集和模型集中只关注单一数据类型,无论是单日期图像还是时间序列,这限制了它们的表达力。我们引入了OmniSat,一种利用多个EO模态之间空间对齐学习丰富多模态表示的新颖架构。为了证明不同模态自然结合的优势,我们通过向两个现有的数据集添加新的模态来增强它们。在下游任务:林业、土地覆盖分类和农作物绘制。OmniSat可以在无监督的方式下学习丰富的表示,从而在仅有一个模态可用于推理的半监督和完全监督设置中实现更好的性能。代码和数据集可在该http URL找到。
URL
https://arxiv.org/abs/2404.08351