Paper Reading AI Learner

Aggregated Deep Local Features for Remote Sensing Image Retrieval

2019-03-22 12:17:04
Raffaele Imbriaco, Clint Sebastian, Egor Bondarev, Peter H.N. de With

Abstract

Remote Sensing Image Retrieval remains a challenging topic due to the special nature of Remote Sensing Imagery. Such images contain various different semantic objects, which clearly complicates the retrieval task. In this paper, we present an image retrieval pipeline that uses attentive, local convolutional features and aggregates them using the Vector of Locally Aggregated Descriptors (VLAD) to produce a global descriptor. We study various system parameters such as the multiplicative and additive attention mechanisms and descriptor dimensionality. We propose a query expansion method that requires no external inputs. Experiments demonstrate that even without training, the local convolutional features and global representation outperform other systems. After system tuning, we can achieve state-of-the-art or competitive results. Furthermore, we observe that our query expansion method increases overall system performance by about 3%, using only the top-three retrieved images. Finally, we show how dimensionality reduction produces compact descriptors with increased retrieval performance and fast retrieval computation times, e.g. 50% faster than the current systems.

Abstract (translated)

由于遥感图像的特殊性,遥感图像检索一直是一个具有挑战性的课题。这些图像包含各种不同的语义对象,这明显使检索任务复杂化。本文提出了一种基于局部卷积特征的图像检索流水线,利用局部卷积描述符(VLAD)的向量对其进行聚合,生成一个全局描述符。我们研究了各种系统参数,如乘法和加法注意机制和描述符维数。我们提出了一种不需要外部输入的查询扩展方法。实验表明,即使没有训练,局部卷积特征和全局表示也优于其他系统。经过系统调整,我们可以达到最先进或具有竞争力的结果。此外,我们观察到,我们的查询扩展方法只使用前三个检索到的图像,将系统整体性能提高了3%左右。最后,我们展示了维数约简是如何产生紧凑的描述符的,它具有更高的检索性能和更快的检索计算时间,例如比当前系统快50%。

URL

https://arxiv.org/abs/1903.09469

PDF

https://arxiv.org/pdf/1903.09469.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot