Abstract
How important is it for training and evaluation sets to not have class overlap in image retrieval? We revisit Google Landmarks v2 clean, the most popular training set, by identifying and removing class overlap with Revisited Oxford and Paris [34], the most popular evaluation set. By comparing the original and the new RGLDv2-clean on a benchmark of reproduced state-of-the-art methods, our findings are striking. Not only is there a dramatic drop in performance, but it is inconsistent across methods, changing the ranking.What does it take to focus on objects or interest and ignore background clutter when indexing? Do we need to train an object detector and the representation separately? Do we need location supervision? We introduce Single-stage Detect-to-Retrieve (CiDeR), an end-to-end, single-stage pipeline to detect objects of interest and extract a global image representation. We outperform previous state-of-the-art on both existing training sets and the new RGLDv2-clean. Our dataset is available at this https URL.
Abstract (translated)
训练和评估集是否具有类重叠在图像检索中有多重要呢?我们重新审视了最受欢迎的训练集Google Landmarks v2 clean,通过识别并删除与Revisited Oxford和Paris [34]最流行的评估集中的类重叠,来研究这个问题。通过比较原始和新的RGLDv2-clean在先进方法上的基准,我们的研究结果是引人注目的。不仅性能急剧下降,而且方法之间存在差异,改变了排名。 在索引时专注于物体或兴趣并忽略背景噪音需要什么?我们需要单独训练物体检测器和表示吗?我们需要位置监督吗?我们引入了Single-stage Detect-to-Retrieve(CiDeR),一种端到端的单阶段管道,用于检测感兴趣的物体并提取全局图像表示。我们在现有的训练集和新的RGLDv2-clean上均超越了最先进的状态。我们的数据集可通过此链接获得。
URL
https://arxiv.org/abs/2404.01524