Paper Reading AI Learner

U-TILISE: A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series

2023-05-22 17:37:10
Corinne Stucker, Vivien Sainte Fare Garnot, Konrad Schindler

Abstract

Satellite image time series in the optical and infrared spectrum suffer from frequent data gaps due to cloud cover, cloud shadows, and temporary sensor outages. It has been a long-standing problem of remote sensing research how to best reconstruct the missing pixel values and obtain complete, cloud-free image sequences. We approach that problem from the perspective of representation learning and develop U-TILISE, an efficient neural model that is able to implicitly capture spatio-temporal patterns of the spectral intensities, and that can therefore be trained to map a cloud-masked input sequence to a cloud-free output sequence. The model consists of a convolutional spatial encoder that maps each individual frame of the input sequence to a latent encoding; an attention-based temporal encoder that captures dependencies between those per-frame encodings and lets them exchange information along the time dimension; and a convolutional spatial decoder that decodes the latent embeddings back into multi-spectral images. We experimentally evaluate the proposed model on EarthNet2021, a dataset of Sentinel-2 time series acquired all over Europe, and demonstrate its superior ability to reconstruct the missing pixels. Compared to a standard interpolation baseline, it increases the PSNR by 1.8 dB at previously seen locations and by 1.3 dB at unseen locations.

Abstract (translated)

光学和红外光谱的卫星图像时间序列经常因为云覆盖、云阴影和临时传感器故障而出现数据缺失。这是一个长期存在的问题,即如何最好地重建缺失像素值并获得完整的无云图像序列。我们从这个表示学习的角度入手,开发了一种高效的神经网络模型——U-TILISE,它能够隐含地捕捉光谱强度的空间和时间模式,因此可以训练以将带云输入序列映射到无云输出序列。模型由一个卷积空间编码器来将输入序列中的每个帧映射到一个隐编码器,一个基于注意力的时间编码器来捕捉这些帧编码之间的依赖关系,并让它们在时间维度上交换信息,最后是一个卷积空间解码器来将隐编码器解码成多光谱图像。我们在 EarthNet2021 一个覆盖欧洲各地的 Sentinel-2 时间序列数据集上实验评估了该模型,并证明了它重建缺失像素的能力。与标准插值基线相比,它在先前看到的位置提高了 PSNR 值,而在未观察到的位置提高了 1.3 dB。

URL

https://arxiv.org/abs/2305.13277

PDF

https://arxiv.org/pdf/2305.13277.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot