Abstract
Nowadays the accurate geo-localization of ground-view images has an important role across domains as diverse as journalism, forensics analysis, transports, and Earth Observation. This work addresses the problem of matching a query ground-view image with the corresponding satellite image without GPS data. This is done by comparing the features from a ground-view image and a satellite one, innovatively leveraging the corresponding latter's segmentation mask through a three-stream Siamese-like network. The proposed method, Semantic Align Net (SAN), focuses on limited Field-of-View (FoV) and ground panorama images (images with a FoV of 360°). The novelty lies in the fusion of satellite images in combination with their semantic segmentation masks, aimed at ensuring that the model can extract useful features and focus on the significant parts of the images. This work shows how SAN through semantic analysis of images improves the performance on the unlabelled CVUSA dataset for all the tested FoVs.
Abstract (translated)
如今,准确的地形图像的地理定位在各种领域(如新闻、法医分析、交通和地球观测)中具有重要作用。本文解决了在没有GPS数据的情况下将查询地形图像与相应卫星图像相匹配的问题。这是通过比较地形图像和卫星图像的特征来实现创新性的,并利用卫星图像相应的分割掩模通过三个通道的Siamese-like网络。所提出的方法,语义对齐网络(SAN),专注于有限的地形视野(FoV)和地面全景图像(具有360°视野的照片)。创新之处在于将卫星图像与它们的语义分割掩模进行融合,旨在确保模型可以提取有用的特征并关注图像的重要部分。本文证明了SAN通过图像语义分析如何提高所有测试FoV的未标记CVUSA数据集的性能。
URL
https://arxiv.org/abs/2404.11302