Paper Reading AI Learner

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

2024-03-20 09:17:22
Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

Abstract

Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million parameters. The pretrained models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection. Extensive experiments across 14 datasets demonstrate the superiority of our models over existing ones of similar size and their competitive performance compared to larger state-of-the-art models, thus validating the effectiveness of MTP.

Abstract (translated)

基于先验模型的 Remote Sensing (RS) 领域已经发生了变革,通过增强各种图像解释任务,使先验模型成为了 Remote Sensing 领域的一种重要工具。预训练是一个活跃的研究课题,涵盖了监督学习和自监督学习方法,以有效地初始化模型权重。然而,将预训练模型应用于下游任务可能会因为它们将预训练建模为图像分类或目标识别任务而遇到任务差异。在这项研究中,我们探讨了 Multi-Task Pretraining (MTP) 范式,以解决这一问题。我们使用共享编码器和支持特定任务解码器的设计,在 SAMRS 数据集上进行多任务监督预训练,包括语义分割、实例分割和旋转物体检测。MTP 支持超过 300 亿参数的卷积神经网络和视觉 Transformer 基础模型。预训练模型在各种 RS 下游任务上进行微调,例如场景分类、水平物体检测、语义分割和变化检测。在 14 个数据集上的大量实验证明,我们的模型在大小相似的情况下优于现有模型,并且与更大的先进模型具有竞争力的性能,从而验证了 MTP 的有效性。

URL

https://arxiv.org/abs/2403.13430

PDF

https://arxiv.org/pdf/2403.13430.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot