Abstract
Online hashing has attracted extensive research attention when facing streaming data. Most online hashing methods, learning binary codes based on pairwise similarities of training instances, fail to capture the semantic relationship, and suffer from a poor generalization in large-scale applications due to large variations. In this paper, we propose to model the similarity distributions between the input data and the hashing codes, upon which a novel supervised online hashing method, dubbed as Similarity Distribution based Online Hashing (SDOH), is proposed, to keep the intrinsic semantic relationship in the produced Hamming space. Specifically, we first transform the discrete similarity matrix into a probability matrix via a Gaussian-based normalization to address the extremely imbalanced distribution issue. And then, we introduce a scaling Student t-distribution to solve the challenging initialization problem, and efficiently bridge the gap between the known and unknown distributions. Lastly, we align the two distributions via minimizing the Kullback-Leibler divergence (KL-diverence) with stochastic gradient descent (SGD), by which an intuitive similarity constraint is imposed to update hashing model on the new streaming data with a powerful generalizing ability to the past data. Extensive experiments on three widely-used benchmarks validate the superiority of the proposed SDOH over the state-of-the-art methods in the online retrieval task.
Abstract (translated)
在线散列技术在面对流数据时引起了广泛的研究关注。大多数在线散列方法都是基于训练实例的成对相似度来学习二进制代码,由于存在较大的变化,无法捕获语义关系,在大规模应用中普遍性较差。本文提出了输入数据与散列码之间的相似性分布模型,在此基础上提出了一种新的监督式在线散列方法,称为基于相似性分布的在线散列(SDOH),以保持产生的汉明空间内的内在语义关系。具体来说,我们首先通过基于高斯的归一化将离散相似矩阵转换为概率矩阵,以解决极不平衡的分布问题。然后,我们引入一个缩放的学生t分布来解决具有挑战性的初始化问题,并有效地弥合已知分布和未知分布之间的差距。最后,利用随机梯度下降(sgd)最小化Kullback-Leibler散度(kl diverence),对两种分布进行了对齐,通过这种方法,我们施加了一个直观的相似性约束来更新新流数据的散列模型,使其具有强大的对过去数据的泛化能力。在三个广泛使用的基准上进行的大量实验验证了所提出的SDOH在在线检索任务中优于最先进的方法。
URL
https://arxiv.org/abs/1905.13382