Abstract
Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.
Abstract (translated)
深度学习模型对于场景分类、变化检测、土地覆盖分割等遥感图像理解任务至关重要。现有的遥感深度学习模型的骨干网络通常通过从ImageNet预训练中获得的预训练权重初始化。然而,遥感图像与自然图像之间存在领域差异(例如,ImageNet),因此仅通过遥感图像预训练的权重初始化的深度学习模型在遥感图像理解任务上表现不佳。尽管在遥感领域有一些预训练方法的研究,但现有的遥感预训练方法仅通过遥感图像无法解决领域差异问题。在本文中,我们提出了一个新颖的遥感预训练框架,通用知识增强遥感预训练(GeRSP),以从遥感图像和自然图像中学习稳健的表示来进行遥感理解任务。GeRSP包含两个预训练分支:(1)采用自监督预训练分支从未标注的遥感图像中学习领域相关的表示。(2)将监督预训练分支集成到GeRSP中,从标注的自然图像中学习通用知识。此外,GeRSP使用师生架构将两个预训练分支同时学习具有通用和特殊知识的表示,从而生成一个强大的预训练模型,用于深度学习模型的初始化。最后,我们对GeRSP和其他遥感预训练方法在三个下游任务上进行了评估,即目标检测、语义分割和场景分类。大量实验结果一致证明,GeRSP可以在统一的方式下有效学习稳健的表示,从而提高遥感下游任务的性能。
URL
https://arxiv.org/abs/2401.04614