Abstract
Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to leverage large amounts of unlabeled data. In contrast to traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used to create pseudo-labels for the unlabeled data and learn pre-trained models. The pre-trained models can then be fine-tuned on downstream tasks such as remote sensing image scene classification. The paper analyzes the effectiveness of SSL pre-training using Million AID - a large unlabeled remote sensing dataset on various remote sensing image scene classification datasets as downstream tasks. More specifically, we evaluate the effectiveness of SSL pre-training using the iBOT framework coupled with Vision transformers (ViT) in contrast to supervised pre-training of ViT using the ImageNet dataset. The comprehensive experimental work across 14 datasets with diverse properties reveals that in-domain SSL leads to improved predictive performance of models compared to the supervised counterparts.
Abstract (translated)
自监督学习(SSL)已成为遥感图像分类的一个有前途的方法,因为它可以利用大量的未标记数据。与传统监督学习不同,SSL旨在学习数据的表述,而不需要显式标签。这可以通过制定辅助任务来实现,这些任务可以用来为未标记数据创建伪标签,并学习训练模型。然后,训练模型可以在下游任务(如遥感图像场景分类)中优化。本文使用数百万AID - 一个大型的未标记遥感图像场景分类数据集作为下游任务,对SSL预训练的效果进行了分析。更具体地说,我们比较了使用ibot框架和视觉转换器(ViT)的SSL预训练与使用ImageNet数据集进行 supervised pre-training的ViT。全面的实验工作涉及14个具有不同属性的数据集,表明相对于监督版的 SSL,跨领域的SSL会导致模型的预测性能改善。
URL
https://arxiv.org/abs/2307.01645