Paper Reading AI Learner

SUES-200: A Multi-height Multi-scene Cross-view Image Benchmark Across Drone and Satellite

2022-04-22 13:49:52
Runzhe Zhu

Abstract

The purpose of cross-view image matching is to match images acquired from the different platforms of the same target scene and then help positioning system to infer the location of the target scene. With the rapid development of drone technology, how to help Drone positioning or navigation through cross-view matching technology has become a challenging research topic. However, the accuracy of current cross-view matching models is still low, mainly because the existing public datasets do not include the differences in images obtained by drones at different heights, and the types of scenes are relatively homogeneous, which makes the models unable to adapt to complex and changing scenes. We propose a new cross-view dataset, SUES-200, to address these issues.SUES-200 contains images acquired by the drone at four flight heights and the corresponding satellite view images under the same target scene. To our knowledge, SUES-200 is the first dataset that considers the differences generated by aerial photography of drones at different flight heights. In addition, we build a pipeline for efficient training testing and evaluation of cross-view matching models. Then, we comprehensively evaluate the performance of feature extractors with different CNN architectures on SUES-200 through an evaluation system for cross-view matching models and propose a robust baseline model. The experimental results show that SUES-200 can help the model learn features with high discrimination at different heights. Evaluating indicators of the matching system improves as the drone flight height gets higher because the drone camera pose and the surrounding environment have less influence on aerial photography.

Abstract (translated)

URL

https://arxiv.org/abs/2204.10704

PDF

https://arxiv.org/pdf/2204.10704.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot