Paper Reading AI Learner

City-wide Street-to-Satellite Image Geolocalization of a Mobile Ground Agent

2022-03-10 19:54:12
Lena M. Downes, Dong-Ki Kim, Ted J. Steiner, Jonathan P. How

Abstract

Cross-view image geolocalization provides an estimate of an agent's global position by matching a local ground image to an overhead satellite image without the need for GPS. It is challenging to reliably match a ground image to the correct satellite image since the images have significant viewpoint differences. Existing works have demonstrated localization in constrained scenarios over small areas but have not demonstrated wider-scale localization. Our approach, called Wide-Area Geolocalization (WAG), combines a neural network with a particle filter to achieve global position estimates for agents moving in GPS-denied environments, scaling efficiently to city-scale regions. WAG introduces a trinomial loss function for a Siamese network to robustly match non-centered image pairs and thus enables the generation of a smaller satellite image database by coarsely discretizing the search area. A modified particle filter weighting scheme is also presented to improve localization accuracy and convergence. Taken together, WAG's network training and particle filter weighting approach achieves city-scale position estimation accuracies on the order of 20 meters, a 98% reduction compared to a baseline training and weighting approach. Applied to a smaller-scale testing area, WAG reduces the final position estimation error by 64% compared to a state-of-the-art baseline from the literature. WAG's search space discretization additionally significantly reduces storage and processing requirements.

Abstract (translated)

URL

https://arxiv.org/abs/2203.05612

PDF

https://arxiv.org/pdf/2203.05612.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot