Paper Reading AI Learner

A Triplet-loss Dilated Residual Network for High-Resolution Representation Learning in Image Retrieval

2023-03-15 07:01:44
Saeideh Yousefzadeh, Hamidreza Pourreza, Hamidreza Mahyar

Abstract

Content-based image retrieval is the process of retrieving a subset of images from an extensive image gallery based on visual contents, such as color, shape or spatial relations, and texture. In some applications, such as localization, image retrieval is employed as the initial step. In such cases, the accuracy of the top-retrieved images significantly affects the overall system accuracy. The current paper introduces a simple yet efficient image retrieval system with a fewer trainable parameters, which offers acceptable accuracy in top-retrieved images. The proposed method benefits from a dilated residual convolutional neural network with triplet loss. Experimental evaluations show that this model can extract richer information (i.e., high-resolution representations) by enlarging the receptive field, thus improving image retrieval accuracy without increasing the depth or complexity of the model. To enhance the extracted representations' robustness, the current research obtains candidate regions of interest from each feature map and applies Generalized-Mean pooling to the regions. As the choice of triplets in a triplet-based network affects the model training, we employ a triplet online mining method. We test the performance of the proposed method under various configurations on two of the challenging image-retrieval datasets, namely Revisited Paris6k (RPar) and UKBench. The experimental results show an accuracy of 94.54 and 80.23 (mean precision at rank 10) in the RPar medium and hard modes and 3.86 (recall at rank 4) in the UKBench dataset, respectively.

Abstract (translated)

基于内容的图像处理是从广泛的图像画廊中检索基于视觉内容的特定图像的过程,例如颜色、形状或空间关系以及纹理。在某些应用中,例如定位,图像处理被用作初始步骤。在这种情况下,最准确的检索图像 significantly 影响了整个系统的准确性。本文介绍了一个简单但高效的图像检索系统,具有较少可训练参数,能够在最准确的检索图像上提供合理的精度。该方法得益于一个扩展的残留卷积神经网络,具有三组元损失。实验评估表明,该模型可以通过扩大响应域来提取更丰富的信息(即高分辨率表示),从而提高图像检索的准确性,而无需增加模型的深度或复杂性。为了提高提取表示的鲁棒性,当前研究从每个特征映射中提取感兴趣的区域,并应用通用均值聚合对这些区域进行收集。由于在三组元网络中选择三组元的影响了模型训练,我们采用了一种三组元在线挖掘方法。我们测试了在挑战性的图像检索数据集上各种配置下,本文提出的方法的性能,包括重访巴黎6k(RPar)数据和英国bench数据集。实验结果显示,在RPar中等和困难模式中,该方法的精度为94.54和80.23(10个排名的平均精度),而在UKbench数据集上,该方法的召回率为3.86。

URL

https://arxiv.org/abs/2303.08398

PDF

https://arxiv.org/pdf/2303.08398.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot