Abstract
Content-based image retrieval is the process of retrieving a subset of images from an extensive image gallery based on visual contents, such as color, shape or spatial relations, and texture. In some applications, such as localization, image retrieval is employed as the initial step. In such cases, the accuracy of the top-retrieved images significantly affects the overall system accuracy. The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters, which offers acceptable accuracy in top-retrieved images. The proposed method benefits from a dilated residual convolutional neural network with triplet loss. Experimental evaluations show that this model can extract richer information (i.e., high-resolution representations) by enlarging the receptive field, thus improving image retrieval accuracy without increasing the depth or complexity of the model. To enhance the extracted representations' robustness, the current research obtains candidate regions of interest from each feature map and applies Generalized-Mean pooling to the regions. As the choice of triplets in a triplet-based network affects the model training, we employ a triplet online mining method. We test the performance of the proposed method under various configurations on two of the challenging image-retrieval datasets, namely Revisited Paris6k (RPar) and UKBench. The experimental results show an accuracy of 94.54 and 80.23 (mean precision at rank 10) in the RPar medium and hard modes and 3.86 (recall at rank 4) in the UKBench dataset, respectively.
Abstract (translated)
基于内容的图像处理是从广泛的图像画廊中检索基于视觉内容的特定图像的过程,例如颜色、形状或空间关系以及纹理。在某些应用中,例如定位,图像处理被用作初始步骤。在这种情况下,最准确的检索图像 significantly 影响了整个系统的准确性。本文介绍了一个简单但高效的图像检索系统,具有较少可训练参数,能够在最准确的检索图像上提供合理的精度。该方法得益于一个扩展的残留卷积神经网络,具有三组元损失。实验评估表明,该模型可以通过扩大响应域来提取更丰富的信息(即高分辨率表示),从而提高图像检索的准确性,而无需增加模型的深度或复杂性。为了提高提取表示的鲁棒性,当前研究从每个特征映射中提取感兴趣的区域,并应用通用均值聚合对这些区域进行收集。由于在三组元网络中选择三组元的影响了模型训练,我们采用了一种三组元在线挖掘方法。我们测试了在挑战性的图像检索数据集上各种配置下,本文提出的方法的性能,包括重访巴黎6k(RPar)数据和英国bench数据集。实验结果显示,在RPar中等和困难模式中,该方法的精度为94.54和80.23(10个排名的平均精度),而在UKbench数据集上,该方法的召回率为3.86。
URL
https://arxiv.org/abs/2303.08398