Abstract
Worldwide geo-localization involves determining the exact geographic location of images captured globally, typically guided by geographic cues such as climate, landmarks, and architectural styles. Despite advancements in geo-localization models like GeoCLIP, which leverages images and location alignment via contrastive learning for accurate predictions, the interpretability of these models remains insufficiently explored. Current concept-based interpretability methods fail to align effectively with Geo-alignment image-location embedding objectives, resulting in suboptimal interpretability and performance. To address this gap, we propose a novel framework integrating global geo-localization with concept bottlenecks. Our method inserts a Concept-Aware Alignment Module that jointly projects image and location embeddings onto a shared bank of geographic concepts (e.g., tropical climate, mountain, cathedral) and minimizes a concept-level loss, enhancing alignment in a concept-specific subspace and enabling robust interpretability. To our knowledge, this is the first work to introduce interpretability into geo-localization. Extensive experiments demonstrate that our approach surpasses GeoCLIP in geo-localization accuracy and boosts performance across diverse geospatial prediction tasks, revealing richer semantic insights into geographic decision-making processes.
Abstract (translated)
全球地理定位涉及确定在全球范围内捕获的图像的确切地理位置,通常通过诸如气候、地标和建筑风格等地理线索来引导。尽管在地理定位模型方面取得了进展,例如GeoCLIP,该模型利用对比学习将图像与位置对齐以实现准确预测,但这些模型的可解释性仍缺乏充分研究。目前的概念基础可解释方法未能有效地与其目标——基于地理位置的图象-位置嵌入对齐相匹配,导致了次优的可解释性和性能表现。 为解决这一问题,我们提出了一种将全球地理定位与概念瓶颈相结合的新框架。我们的方法引入了一个“概念感知对齐模块”,该模块能够将图像和位置嵌入投影到一个共享的地理概念库(例如热带气候、山脉、教堂)上,并最小化概念级别的损失函数,从而在特定的概念子空间中增强对齐效果并实现强大的可解释性。 据我们所知,这是首次尝试为地理定位引入可解释性的研究。广泛的实验表明,我们的方法超越了GeoCLIP的地理定位准确性,在各种地理预测任务中提高了性能表现,并揭示了更丰富的地理决策过程中的语义见解。
URL
https://arxiv.org/abs/2509.01910