Paper Reading AI Learner

Deep Saliency Hashing

2018-07-04 06:31:13
Sheng Jin

Abstract

In recent years, hashing methods have been proved efficient for large-scale Web media search. However, existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce attention mechanism to the learning of hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. The saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on Oxford Flowers-17 and Stanford Dogs-120 demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the existing best retrieval performance (DPSH) by approximately 12%. DSaH also outperforms several state-of-the-art hashing methods on general datasets, including CIFAR-10 and NUS-WIDE.

Abstract (translated)

近年来,已经证明散列方法对于大规模Web媒体搜索是有效的。然而,现有的一般散列方法对于描述具有相似整体外观但具有细微差别的细粒度对象具有有限的辨别力。为了解决这个问题,我们首次将注意机制引入到哈希码的学习中。具体来说,我们提出了一种新的深度哈希模型,称为深度显着性哈希(DSaH),它自动挖掘显着区域并同时学习保留语义的哈希码。 DSaH是一个由注意网络和散列网络组成的两步​​端到端模型。我们的损失函数包含三个基本组件,包括语义丢失,显着性丢失和量化损失。显着性损失引导注意网络从成对图像中挖掘判别区域。我们对细粒度和一般检索数据集进行了大量实验,以进行性能评估。 Oxford Flowers-17和Stanford Dogs-120的实验结果表明,我们的DSaH在细粒度检索任务中表现最佳,并且使现有的最佳检索性能(DPSH)大约超过12%。 DSaH在一般数据集上也优于几种最先进的散列方法,包括CIFAR-10和NUS-WIDE。

URL

https://arxiv.org/abs/1807.01459

PDF

https://arxiv.org/pdf/1807.01459.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot