Supervised and Contrastive Self-Supervised In-Domain Representation Learning for Dense Prediction Problems in Remote Sensing

Abstract
Abstract (translated)
URL
PDF

Abstract

In recent years Convolutional neural networks (CNN) have made significant progress in computer vision. These advancements have been applied to other areas, such as remote sensing and have shown satisfactory results. However, the lack of large labeled datasets and the inherent complexity of remote sensing problems have made it difficult to train deep CNNs for dense prediction problems. To solve this issue, ImageNet pretrained weights have been used as a starting point in various dense predictions tasks. Although this type of transfer learning has led to improvements, the domain difference between natural and remote sensing images has also limited the performance of deep CNNs. On the other hand, self-supervised learning methods for learning visual representations from large unlabeled images have grown substantially over the past two years. Accordingly, in this paper we have explored the effectiveness of in-domain representations in both supervised and self-supervised forms to solve the domain difference between remote sensing and the ImageNet dataset. The obtained weights from remote sensing images are utilized as initial weights for solving semantic segmentation and object detection tasks and state-of-the-art results are obtained. For self-supervised pre-training, we have utilized the SimSiam algorithm as it is simple and does not need huge computational resources. One of the most influential factors in acquiring general visual representations from remote sensing images is the pre-training dataset. To examine the effect of the pre-training dataset, equal-sized remote sensing datasets are used for pre-training. Our results have demonstrated that using datasets with a high spatial resolution for self-supervised representation learning leads to high performance in downstream tasks.

Abstract (translated)

近年来卷积神经网络(CNN)在计算机视觉方面取得了重大进展。这些进展已将其应用于其他领域,如遥感,并取得了令人满意的结果。然而,缺乏大型标记数据集以及遥感问题的固有复杂性,使得训练深度卷积神经网络以密集预测问题变得困难。为了解决这一问题,ImageNet的前向权重已用作各种密集预测任务的起始点。虽然这种转移学习已经带来了改善,但自然和遥感图像之间的域差异也限制了深度卷积神经网络的性能。另一方面,从大型未标记图像学习视觉表示的方法在过去几年中已经显著增长。因此,在本文中,我们探索了在监督和自监督形式下的内部表示在不同域之间的有效性,以解决遥感和ImageNet数据集之间的域差异。从遥感图像中提取的权重用作解决语义分割和物体检测任务的原始权重,并取得了最先进的结果。对于自监督预训练,我们使用了Siamese算法,因为它简单且不需要巨大的计算资源。从遥感图像中提取的预处理数据集是预训练数据的示例,以检查其影响。我们的结果表明,使用高空间分辨率的预处理数据集进行自监督表示学习会导致下游任务的高表现。

URL

https://arxiv.org/abs/2301.12541

PDF

https://arxiv.org/pdf/2301.12541.pdf