Paper Reading AI Learner

Region Convolutional Features for Multi-Label Remote Sensing Image Retrieval

2018-07-23 14:04:18
Weixun Zhou, Xueqing Deng, Zhenfeng Shao

Abstract

Conventional remote sensing image retrieval (RSIR) systems usually perform single-label retrieval where each image is annotated by a single label representing the most significant semantic content of the image. This assumption, however, ignores the complexity of remote sensing images, where an image might have multiple classes (i.e., multiple labels), thus resulting in worse retrieval performance. We therefore propose a novel multi-label RSIR approach with fully convolutional networks (FCN). In our approach, we first train a FCN model using a pixel-wise labeled dataset,and the trained FCN is then used to predict the segmentation maps of each image in the considered archive. We finally extract region convolutional features of each image based on its segmentation map.The region features can be either used to perform region-based retrieval or further post-processed to obtain a feature vector for similarity measure. The experimental results show that our approach achieves state-of-the-art performance in contrast to conventional single-label and recent multi-label RSIR approaches.

Abstract (translated)

传统的遥感图像检索(RSIR)系统通常执行单标签检索,其中每个图像由表示图像的最重要语义内容的单个标签注释。然而,该假设忽略了遥感图像的复杂性,其中图像可能具有多个类别(即,多个标签),因此导致较差的检索性能。因此,我们提出了一种具有完全卷积网络(FCN)的新型多标签RSIR方法。在我们的方法中,我们首先使用逐像素标记的数据集训练FCN模型,然后使用训练的FCN来预测所考虑的存档中的每个图像的分割图。我们最终基于其分割图提取每个图像的区域卷积特征。区域特征可以用于执行基于区域的检索或者进一步后处理以获得用于相似性度量的特征向量。实验结果表明,与传统的单标签和最近的多标签RSIR方法相比,我们的方法实现了最先进的性能。

URL

https://arxiv.org/abs/1807.08634

PDF

https://arxiv.org/pdf/1807.08634.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot