Abstract
Satellite image time series in the optical and infrared spectrum suffer from frequent data gaps due to cloud cover, cloud shadows, and temporary sensor outages. It has been a long-standing problem of remote sensing research how to best reconstruct the missing pixel values and obtain complete, cloud-free image sequences. We approach that problem from the perspective of representation learning and develop U-TILISE, an efficient neural model that is able to implicitly capture spatio-temporal patterns of the spectral intensities, and that can therefore be trained to map a cloud-masked input sequence to a cloud-free output sequence. The model consists of a convolutional spatial encoder that maps each individual frame of the input sequence to a latent encoding; an attention-based temporal encoder that captures dependencies between those per-frame encodings and lets them exchange information along the time dimension; and a convolutional spatial decoder that decodes the latent embeddings back into multi-spectral images. We experimentally evaluate the proposed model on EarthNet2021, a dataset of Sentinel-2 time series acquired all over Europe, and demonstrate its superior ability to reconstruct the missing pixels. Compared to a standard interpolation baseline, it increases the PSNR by 1.8 dB at previously seen locations and by 1.3 dB at unseen locations.
Abstract (translated)
光学和红外光谱的卫星图像时间序列经常因为云覆盖、云阴影和临时传感器故障而出现数据缺失。这是一个长期存在的问题,即如何最好地重建缺失像素值并获得完整的无云图像序列。我们从这个表示学习的角度入手,开发了一种高效的神经网络模型——U-TILISE,它能够隐含地捕捉光谱强度的空间和时间模式,因此可以训练以将带云输入序列映射到无云输出序列。模型由一个卷积空间编码器来将输入序列中的每个帧映射到一个隐编码器,一个基于注意力的时间编码器来捕捉这些帧编码之间的依赖关系,并让它们在时间维度上交换信息,最后是一个卷积空间解码器来将隐编码器解码成多光谱图像。我们在 EarthNet2021 一个覆盖欧洲各地的 Sentinel-2 时间序列数据集上实验评估了该模型,并证明了它重建缺失像素的能力。与标准插值基线相比,它在先前看到的位置提高了 PSNR 值,而在未观察到的位置提高了 1.3 dB。
URL
https://arxiv.org/abs/2305.13277