Abstract
In recent years, hashing methods have been proved efficient for large-scale Web media search. However, existing general hashing methods have limited discriminative power for describing fine-grained objects that share similar overall appearance but have subtle difference. To solve this problem, we for the first time introduce attention mechanism to the learning of hashing codes. Specifically, we propose a novel deep hashing model, named deep saliency hashing (DSaH), which automatically mines salient regions and learns semantic-preserving hashing codes simultaneously. DSaH is a two-step end-to-end model consisting of an attention network and a hashing network. Our loss function contains three basic components, including the semantic loss, the saliency loss, and the quantization loss. The saliency loss guides the attention network to mine discriminative regions from pairs of images. We conduct extensive experiments on both fine-grained and general retrieval datasets for performance evaluation. Experimental results on Oxford Flowers-17 and Stanford Dogs-120 demonstrate that our DSaH performs the best for fine-grained retrieval task and beats the existing best retrieval performance (DPSH) by approximately 12%. DSaH also outperforms several state-of-the-art hashing methods on general datasets, including CIFAR-10 and NUS-WIDE.
Abstract (translated)
近年来,已经证明散列方法对于大规模Web媒体搜索是有效的。然而,现有的一般散列方法对于描述具有相似整体外观但具有细微差别的细粒度对象具有有限的辨别力。为了解决这个问题,我们首次将注意机制引入到哈希码的学习中。具体来说,我们提出了一种新的深度哈希模型,称为深度显着性哈希(DSaH),它自动挖掘显着区域并同时学习保留语义的哈希码。 DSaH是一个由注意网络和散列网络组成的两步端到端模型。我们的损失函数包含三个基本组件,包括语义丢失,显着性丢失和量化损失。显着性损失引导注意网络从成对图像中挖掘判别区域。我们对细粒度和一般检索数据集进行了大量实验,以进行性能评估。 Oxford Flowers-17和Stanford Dogs-120的实验结果表明,我们的DSaH在细粒度检索任务中表现最佳,并且使现有的最佳检索性能(DPSH)大约超过12%。 DSaH在一般数据集上也优于几种最先进的散列方法,包括CIFAR-10和NUS-WIDE。
URL
https://arxiv.org/abs/1807.01459