Paper Reading AI Learner

Supervised and Contrastive Self-Supervised In-Domain Representation Learning for Dense Prediction Problems in Remote Sensing


Abstract

In recent years Convolutional neural networks (CNN) have made significant progress in computer vision. These advancements have been applied to other areas, such as remote sensing and have shown satisfactory results. However, the lack of large labeled datasets and the inherent complexity of remote sensing problems have made it difficult to train deep CNNs for dense prediction problems. To solve this issue, ImageNet pretrained weights have been used as a starting point in various dense predictions tasks. Although this type of transfer learning has led to improvements, the domain difference between natural and remote sensing images has also limited the performance of deep CNNs. On the other hand, self-supervised learning methods for learning visual representations from large unlabeled images have grown substantially over the past two years. Accordingly, in this paper we have explored the effectiveness of in-domain representations in both supervised and self-supervised forms to solve the domain difference between remote sensing and the ImageNet dataset. The obtained weights from remote sensing images are utilized as initial weights for solving semantic segmentation and object detection tasks and state-of-the-art results are obtained. For self-supervised pre-training, we have utilized the SimSiam algorithm as it is simple and does not need huge computational resources. One of the most influential factors in acquiring general visual representations from remote sensing images is the pre-training dataset. To examine the effect of the pre-training dataset, equal-sized remote sensing datasets are used for pre-training. Our results have demonstrated that using datasets with a high spatial resolution for self-supervised representation learning leads to high performance in downstream tasks.

Abstract (translated)

近年来卷积神经网络(CNN)在计算机视觉方面取得了重大进展。这些进展已将其应用于其他领域,如遥感,并取得了令人满意的结果。然而,缺乏大型标记数据集以及遥感问题的固有复杂性,使得训练深度卷积神经网络以密集预测问题变得困难。为了解决这一问题,ImageNet的前向权重已用作各种密集预测任务的起始点。虽然这种转移学习已经带来了改善,但自然和遥感图像之间的域差异也限制了深度卷积神经网络的性能。另一方面,从大型未标记图像学习视觉表示的方法在过去几年中已经显著增长。因此,在本文中,我们探索了在监督和自监督形式下的内部表示在不同域之间的有效性,以解决遥感和ImageNet数据集之间的域差异。从遥感图像中提取的权重用作解决语义分割和物体检测任务的原始权重,并取得了最先进的结果。对于自监督预训练,我们使用了Siamese算法,因为它简单且不需要巨大的计算资源。从遥感图像中提取的预处理数据集是预训练数据的示例,以检查其影响。我们的结果表明,使用高空间分辨率的预处理数据集进行自监督表示学习会导致下游任务的高表现。

URL

https://arxiv.org/abs/2301.12541

PDF

https://arxiv.org/pdf/2301.12541.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot