Paper Reading AI Learner

In-Domain Self-Supervised Learning Can Lead to Improvements in Remote Sensing Image Classification

2023-07-04 10:57:52
Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev

Abstract

Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to leverage large amounts of unlabeled data. In contrast to traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used to create pseudo-labels for the unlabeled data and learn pre-trained models. The pre-trained models can then be fine-tuned on downstream tasks such as remote sensing image scene classification. The paper analyzes the effectiveness of SSL pre-training using Million AID - a large unlabeled remote sensing dataset on various remote sensing image scene classification datasets as downstream tasks. More specifically, we evaluate the effectiveness of SSL pre-training using the iBOT framework coupled with Vision transformers (ViT) in contrast to supervised pre-training of ViT using the ImageNet dataset. The comprehensive experimental work across 14 datasets with diverse properties reveals that in-domain SSL leads to improved predictive performance of models compared to the supervised counterparts.

Abstract (translated)

自监督学习(SSL)已成为遥感图像分类的一个有前途的方法,因为它可以利用大量的未标记数据。与传统监督学习不同,SSL旨在学习数据的表述,而不需要显式标签。这可以通过制定辅助任务来实现,这些任务可以用来为未标记数据创建伪标签,并学习训练模型。然后,训练模型可以在下游任务(如遥感图像场景分类)中优化。本文使用数百万AID - 一个大型的未标记遥感图像场景分类数据集作为下游任务,对SSL预训练的效果进行了分析。更具体地说,我们比较了使用ibot框架和视觉转换器(ViT)的SSL预训练与使用ImageNet数据集进行 supervised pre-training的ViT。全面的实验工作涉及14个具有不同属性的数据集,表明相对于监督版的 SSL,跨领域的SSL会导致模型的预测性能改善。

URL

https://arxiv.org/abs/2307.01645

PDF

https://arxiv.org/pdf/2307.01645.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot