Paper Reading AI Learner

OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata

2025-09-22 19:22:32
Oussema Dhaouadi, Riccardo Marin, Johannes Meier, Jacques Kaiser, Daniel Cremers

Abstract

Accurate visual localization from aerial views is a fundamental problem with applications in mapping, large-area inspection, and search-and-rescue operations. In many scenarios, these systems require high-precision localization while operating with limited resources (e.g., no internet connection or GNSS/GPS support), making large image databases or heavy 3D models impractical. Surprisingly, little attention has been given to leveraging orthographic geodata as an alternative paradigm, which is lightweight and increasingly available through free releases by governmental authorities (e.g., the European Union). To fill this gap, we propose OrthoLoC, the first large-scale dataset comprising 16,425 UAV images from Germany and the United States with multiple modalities. The dataset addresses domain shifts between UAV imagery and geospatial data. Its paired structure enables fair benchmarking of existing solutions by decoupling image retrieval from feature matching, allowing isolated evaluation of localization and calibration performance. Through comprehensive evaluation, we examine the impact of domain shifts, data resolutions, and covisibility on localization accuracy. Finally, we introduce a refinement technique called AdHoP, which can be integrated with any feature matcher, improving matching by up to 95% and reducing translation error by up to 63%. The dataset and code are available at: this https URL.

Abstract (translated)

从空中视角进行精确的视觉定位是一个基本问题,它在地图绘制、大面积检查和搜索救援操作等领域都有应用。在许多场景中,这些系统需要在资源有限(例如没有互联网连接或GNSS/GPS支持)的情况下实现高精度定位,这使得使用大型图像数据库或重型3D模型变得不切实际。令人惊讶的是,很少有人关注利用正射地理数据作为替代方案,这种数据轻量化且通过政府机构的免费发布而日益普及(例如欧盟)。为了填补这一空白,我们提出了OrthoLoC,这是第一个大规模的数据集,包含来自德国和美国的16,425张无人机图像,并具备多种模态。该数据集解决了无人机影像与地理空间数据之间的领域转移问题。其配对结构使得可以公平地评估现有解决方案,通过将图像检索与特征匹配分离来独立评价定位和校准性能的影响。通过对各种因素进行全面的评估,我们考察了领域转变、数据分辨率以及共视性对定位精度的影响。最后,我们介绍了一种名为AdHoP的改进技术,该技术可以与任何特征匹配器集成使用,可将匹配提高多达95%,并将平移误差减少高达63%。 数据集和代码可以在以下网址获取:[this https URL]

URL

https://arxiv.org/abs/2509.18350

PDF

https://arxiv.org/pdf/2509.18350.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot