Abstract
In recent years, convolutional neural networks (CNNs) have achieved remarkable advancement in the field of remote sensing image super-resolution due to the complexity and variability of textures and structures in remote sensing images (RSIs), which often repeat in the same images but differ across others. Current deep learning-based super-resolution models focus less on high-frequency features, which leads to suboptimal performance in capturing contours, textures, and spatial information. State-of-the-art CNN-based methods now focus on the feature extraction of RSIs using attention mechanisms. However, these methods are still incapable of effectively identifying and utilizing key content attention signals in RSIs. To solve this problem, we proposed an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE) for effectively extracting the features by using the channel and spatial attention incorporated with the standard vision transformer (ViT). The proposed method trained over the UCMerced dataset on scales 2, 3, and 4. The experimental results show that our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones, which enhances the quality of super-resolved images. Our model achieved superior performance compared to various existing models.
Abstract (translated)
近年来,卷积神经网络(CNNs)在远程 sensing图像超分辨率领域取得了显著进展,这是由于远程 sensing图像(RSIs)中纹理和结构复杂性和变异性,这些图像在同一图像上重复,但在其他图像上不同。目前基于深度学习的超分辨率方法更加关注低频特征,这导致在捕捉轮廓、纹理和空间信息方面性能较低。最先进的基于CNN的超级分辨率方法现在专注于使用注意力机制提取RSIs的特征。然而,这些方法仍然无法有效地识别和利用RSIs中的关键内容关注信号。为了解决这个问题,我们提出了一个名为通道和空间注意力特征提取(CSA-FE)的高级特征提取模块,通过使用与标准视觉Transformer(ViT)集成的通道和空间注意力来有效地提取特征。在训练方面,我们使用UC Merced数据集在规模2、3和4上进行训练。实验结果表明,与各种现有模型相比,我们提出的方法有助于模型集中于包含高频信息的特定通道和空间位置,使模型可以关注相关特征并抑制无关特征,从而提高超分辨率图像的质量。我们的模型在各种现有模型中具有卓越的性能。
URL
https://arxiv.org/abs/2405.04595