A Versatile Framework for Multi-scene Person Re-identification

Abstract
Abstract (translated)
URL
PDF

Abstract

Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at this https URL.

Abstract (translated)

为了学习同一人在不同视角下的图像之间的关联，在过去的十年里，对Person Re-identification（ReID）的研究已经得到了广泛的发展。为了克服在不同视角之间图像之间的图像差异，为了解决诸如分辨率变化、着装变化、遮挡和模态变化等问题，已经开发了大量的ReID模型的变体。尽管许多ReID变体在性能上表现出色，但这些变体通常会以独特的方式运行，并且不能应用于其他问题。据我们所知，没有一种通用的ReID模型可以同时处理各种ReID挑战。我们的主要想法是建立一个两阶段提示为基础的双胞胎建模框架，称为VersReID。VersReID首先利用场景标签来训练一个包含丰富知识以处理各种场景的ReID银行，其中几组场景特定的提示被用于编码不同的场景特定知识。在第二阶段，我们从ReID银行中提取具有多样提示的V-支模态，用于自适应地解决不同场景的ReID，消除在推理阶段需要场景标签的需求。为了方便训练VersReID，我们还通过多场景 priori数据增强（MPDA）策略引入了多场景属性。通过大量实验，我们证明了在不需要在推理阶段手动分配场景标签的情况下，学习一个有效且多场景的ReID模型可以成功地解决ReID任务，包括一般、低分辨率、着装变化、遮挡和跨模态场景。代码和模型可以从该链接下载。

URL

https://arxiv.org/abs/2403.11121

PDF

https://arxiv.org/pdf/2403.11121.pdf

A Versatile Framework for Multi-scene Person Re-identification

Abstract

Abstract (translated)

URL

PDF Copy

PDF