Paper Reading AI Learner

Generic Knowledge Boosted Pre-training For Remote Sensing Images

2024-01-09 15:36:07
Ziyue Huang, Mingming Zhang, Yuan Gong, Qingjie Liu, Yunhong Wang

Abstract

Deep learning models are essential for scene classification, change detection, land cover segmentation, and other remote sensing image understanding tasks. Most backbones of existing remote sensing deep learning models are typically initialized by pre-trained weights obtained from ImageNet pre-training (IMP). However, domain gaps exist between remote sensing images and natural images (e.g., ImageNet), making deep learning models initialized by pre-trained weights of IMP perform poorly for remote sensing image understanding. Although some pre-training methods are studied in the remote sensing community, current remote sensing pre-training methods face the problem of vague generalization by only using remote sensing images. In this paper, we propose a novel remote sensing pre-training framework, Generic Knowledge Boosted Remote Sensing Pre-training (GeRSP), to learn robust representations from remote sensing and natural images for remote sensing understanding tasks. GeRSP contains two pre-training branches: (1) A self-supervised pre-training branch is adopted to learn domain-related representations from unlabeled remote sensing images. (2) A supervised pre-training branch is integrated into GeRSP for general knowledge learning from labeled natural images. Moreover, GeRSP combines two pre-training branches using a teacher-student architecture to simultaneously learn representations with general and special knowledge, which generates a powerful pre-trained model for deep learning model initialization. Finally, we evaluate GeRSP and other remote sensing pre-training methods on three downstream tasks, i.e., object detection, semantic segmentation, and scene classification. The extensive experimental results consistently demonstrate that GeRSP can effectively learn robust representations in a unified manner, improving the performance of remote sensing downstream tasks.

Abstract (translated)

深度学习模型对于场景分类、变化检测、土地覆盖分割等遥感图像理解任务至关重要。现有的遥感深度学习模型的骨干网络通常通过从ImageNet预训练中获得的预训练权重初始化。然而,遥感图像与自然图像之间存在领域差异(例如,ImageNet),因此仅通过遥感图像预训练的权重初始化的深度学习模型在遥感图像理解任务上表现不佳。尽管在遥感领域有一些预训练方法的研究,但现有的遥感预训练方法仅通过遥感图像无法解决领域差异问题。在本文中,我们提出了一个新颖的遥感预训练框架,通用知识增强遥感预训练(GeRSP),以从遥感图像和自然图像中学习稳健的表示来进行遥感理解任务。GeRSP包含两个预训练分支:(1)采用自监督预训练分支从未标注的遥感图像中学习领域相关的表示。(2)将监督预训练分支集成到GeRSP中,从标注的自然图像中学习通用知识。此外,GeRSP使用师生架构将两个预训练分支同时学习具有通用和特殊知识的表示,从而生成一个强大的预训练模型,用于深度学习模型的初始化。最后,我们对GeRSP和其他遥感预训练方法在三个下游任务上进行了评估,即目标检测、语义分割和场景分类。大量实验结果一致证明,GeRSP可以在统一的方式下有效学习稳健的表示,从而提高遥感下游任务的性能。

URL

https://arxiv.org/abs/2401.04614

PDF

https://arxiv.org/pdf/2401.04614.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot