Abstract
Measuring visual similarity between two or more instances within a data distribution is a fundamental task in image retrieval. Theoretically, non-metric distances are able to generate a more complex and accurate similarity model than metric distances, provided that the non-linear data distribution is precisely captured by the system. In this work, we explore neural networks models for learning a non-metric similarity function for instance search. We argue that non-metric similarity functions based on neural networks can build a better model of human visual perception than standard metric distances. As our proposed similarity function is differentiable, we explore a real end-to-end trainable approach for image retrieval, i.e. we learn the weights from the input image pixels to the final similarity score. Experimental evaluation shows that non-metric similarity networks are able to learn visual similarities between images and improve performance on top of state-of-the-art image representations, boosting results in standard image retrieval datasets with respect standard metric distances.
Abstract (translated)
测量数据分布中两个或多个实例之间的视觉相似性是图像检索中的一项基本任务。理论上,非度量距离能够产生比度量距离更复杂和准确的相似性模型,前提是系统能够精确地捕获非线性数据分布。在这项工作中,我们探索学习非度量相似函数的神经网络模型,例如搜索。我们认为,基于神经网络的非度量相似函数可以建立比标准度量距离更好的人类视觉感知模型。由于我们提出的相似度函数是可微的,因此我们探索了一种真正的端到端的图像检索训练方法,即从输入图像像素到最终相似度分数的权重学习。实验评估表明,非度量相似度网络能够学习图像之间的视觉相似度,并在最先进的图像表示技术的基础上提高性能,从而提高标准图像检索数据集相对于标准度量距离的结果。
URL
https://arxiv.org/abs/1709.01353