Paper Reading AI Learner

SpectralGPT: Spectral Foundation Model

2023-11-13 07:09:30
Danfeng Hong, Bing Zhang, Xuyang Li, Yuxuan Li, Chenyu Li, Jing Yao, Naoto Yokoya, Hao Li, Xiuping Jia, Antonio Plaza, Gamba Paolo, Jon Atli Benediktsson, Jocelyn Chanussot

Abstract

The foundation model has recently garnered significant attention due to its potential to revolutionize the field of visual representation learning in a self-supervised manner. While most foundation models are tailored to effectively process RGB images for various visual tasks, there is a noticeable gap in research focused on spectral data, which offers valuable information for scene understanding, especially in remote sensing (RS) applications. To fill this gap, we created for the first time a universal RS foundation model, named SpectralGPT, which is purpose-built to handle spectral RS images using a novel 3D generative pretrained transformer (GPT). Compared to existing foundation models, SpectralGPT 1) accommodates input images with varying sizes, resolutions, time series, and regions in a progressive training fashion, enabling full utilization of extensive RS big data; 2) leverages 3D token generation for spatial-spectral coupling; 3) captures spectrally sequential patterns via multi-target reconstruction; 4) trains on one million spectral RS images, yielding models with over 600 million parameters. Our evaluation highlights significant performance improvements with pretrained SpectralGPT models, signifying substantial potential in advancing spectral RS big data applications within the field of geoscience across four downstream tasks: single/multi-label scene classification, semantic segmentation, and change detection.

Abstract (translated)

基础模型因其在自监督方式下可能彻底颠覆视觉表示学习领域的潜在影响而最近引起了广泛关注。虽然大多数基础模型都是为有效地处理各种视觉任务而设计的,但在关注光谱数据的研究方面存在明显的差距,这对场景理解,尤其是在遥感和(RS)应用中,具有重要的价值。为了填补这一空白,我们创建了第一个通用 RS 基础模型,名为 SpectralGPT,它专门使用一种新颖的 3D 生成预训练变换器(GPT)处理光谱 RS 图像。与现有基础模型相比,SpectralGPT 1) 按 progressive training 的方式适应不同大小、分辨率、时间序列和区域的输入图像,实现对 RS 大数据的充分利用;2) 利用 3D 词生成进行空间-光谱耦合;3) 通过多目标重构捕捉光谱序列模式;4) 在一百万个光谱 RS 图像上训练,产生了具有超过 600 百万参数的模型。我们的评估显示,预训练的 SpectralGPT 模型在性能上取得了显著的改进,这表明在地质科学领域中,通过推动 RS 大数据应用的发展,具有巨大的潜力。 尽管在某些方面,SpectralGPT 可能无法完全替代现有的基础模型,但它在尝试解决当前难以解决的问题方面确实展现出了巨大的潜力。

URL

https://arxiv.org/abs/2311.07113

PDF

https://arxiv.org/pdf/2311.07113.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot