Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Abstract
Abstract (translated)
URL
PDF

Abstract

Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at this https URL.

Abstract (translated)

视觉检索旨在寻找给定查询项目的情选画廊中最相关的视觉项目，例如图像和视频。准确性和效率是检索任务中两个相互竞争的目标。为了追求进一步提高准确度，本文提出了一种多教师蒸馏框架Whiten-MTD，它能够将预训练的检索模型的知识传递给轻量级的 student 模型，实现高效的视觉检索。此外，我们还发现不同检索模型的相似性是多样化和不可比较的，这使得从多个模型共同蒸馏知识具有挑战性。因此，我们在融合前对教师模型的输出进行预处理，使得多个教师蒸馏模型能够有效进行。Whiten-MTD 具有直观的简单性和实际有效的特点。在两个里程碑图像检索数据集和一个视频检索数据集上的实验表明，我们所提出的方法的有效性，以及其在检索性能和效率上的良好平衡。我们的源代码已发布在上述链接处。

URL

https://arxiv.org/abs/2312.09716

PDF

https://arxiv.org/pdf/2312.09716.pdf

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Abstract

Abstract (translated)

URL

PDF Copy

PDF