Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Camera information is usually available, and the feature distribution in the single camera usually focuses more on the appearance of the individual and has less intra-identity variance. Inspired by the observation, we introduce a \textbf{C}amera-\textbf{A}ware \textbf{L}abel \textbf{R}efinement~(CALR) framework that reduces camera discrepancy by clustering intra-camera similarity. Specifically, we employ intra-camera training to obtain reliable local pseudo labels within each camera, and then refine global labels generated by inter-camera clustering and train the discriminative model using more reliable global pseudo labels in a self-paced manner. Meanwhile, we develop a camera-alignment module to align feature distributions under different cameras, which could help deal with the camera variance further. Extensive experiments validate the superiority of our proposed method over state-of-the-art approaches. The code is accessible at this https URL.
无监督的人重新识别的目的是检索指定人物的图像,而无需身份标签。许多最近的无监督 Re-ID 方法采用聚类为基础的方法来测量跨相机特征的相似性,将图像大致分为簇。它们忽略了由相机领域差异引起的特征分布差异,导致性能降低。相机信息通常可用,而单个相机的特征分布通常更加关注单个个人的外观,并且具有较少的内部identity variance。受到观察的启发,我们引入了一个 Camera-Aware Label Refinement (CALR) 框架,通过聚类相机内相似性来减少相机差异。具体来说,我们使用相机内训练来获得每个相机内的可靠局部伪标签,然后通过 inter-camera 聚类生成的全局标签,以更可靠的全局伪标签的方式训练判别模型。同时,我们开发了一个相机对齐模块,用于在不同相机上对特征分布进行对齐,这可以帮助我们进一步处理相机变化。大量实验验证了我们提出的方法相对于最先进方法的优越性。代码可在此链接访问:
https://arxiv.org/abs/2403.16450
Lifelong Person Re-Identification (LReID) aims to continuously learn from successive data streams, matching individuals across multiple cameras. The key challenge for LReID is how to effectively preserve old knowledge while learning new information incrementally. Task-level domain gaps and limited old task datasets are key factors leading to catastrophic forgetting in ReLD, which are overlooked in existing methods. To alleviate this problem, we propose a novel Diverse Representation Embedding (DRE) framework for LReID. The proposed DRE preserves old knowledge while adapting to new information based on instance-level and task-level layout. Concretely, an Adaptive Constraint Module (ACM) is proposed to implement integration and push away operations between multiple representations, obtaining dense embedding subspace for each instance to improve matching ability on limited old task datasets. Based on the processed diverse representation, we interact knowledge between the adjustment model and the learner model through Knowledge Update (KU) and Knowledge Preservation (KP) strategies at the task-level layout, which reduce the task-wise domain gap on both old and new tasks, and exploit diverse representation of each instance in limited datasets from old tasks, improving model performance for extended periods. Extensive experiments were conducted on eleven Re-ID datasets, including five seen datasets for training in order-1 and order-2 orders and six unseen datasets for inference. Compared to state-of-the-art methods, our method achieves significantly improved performance in holistic, large-scale, and occluded datasets.
终身人物识别(LReID)旨在从连续的数据流中持续学习,并将个体跨越多台摄像机进行匹配。LReID的关键挑战是如何在逐渐学习新信息的同时有效保留旧知识。任务级别领域空白和有限的旧任务数据集是导致ReLD灾难性遗忘的原因,而现有的方法忽略了这个问题。为了减轻这个问题,我们提出了一个名为Diverse Representation Embedding(DRE)的新LReID框架。DRE在保留旧知识的同时,根据实例级别和任务级别布局自适应地适应新信息。具体来说,我们提出了一个自适应约束模块(ACM)来实现多个表示之间的集成和推开操作,为每个实例获得稠密嵌入子空间,从而提高在有限的老任务数据集中的匹配能力。根据处理后的多样性表示,我们在任务级别布局中通过知识更新(KU)和知识保留(KP)策略与调整模型和学习器模型交互,从而在老任务和新任务上减少任务级别领域差异,并从老任务有限数据集中每个实例的多样性表示中挖掘知识,提高模型在长时间内的性能。我们在包括order-1和order-2订单的五个可见数据集以及六个未见数据集上进行了广泛的实验。与最先进的 methods相比,我们的方法在整体、大型和遮挡的数据集上取得了显著的改进。
https://arxiv.org/abs/2403.16003
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques. However, the existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios. To meet the goal of improving the explicit generalization of ReID models, we develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features. 1) Diverse collection scenes: multiple independent open-world and highly dynamic collecting scenes, including streets, intersections, shopping malls, etc. 2) Diverse lighting variations: long time spans from daytime to nighttime with abundant illumination changes. 3) Diverse person status: multiple camera networks in all seasons with normal/adverse weather conditions and diverse pedestrian appearances (e.g., clothes, personal belongings, poses, etc.). 4) Protected privacy: invisible faces for privacy critical applications. To improve the implicit generalization of ReID, we further propose a Latent Domain Expansion (LDE) method to develop the potential of source data, which decouples discriminative identity-relevant and trustworthy domain-relevant features and implicitly enforces domain-randomized identity feature space expansion with richer domain diversity to facilitate domain invariant representations. Our comprehensive evaluations with most benchmark datasets in the community are crucial for progress, although this work is far from the grand goal toward open-world and dynamic wild applications.
由于数据驱动的深度学习技术的进步,人物识别(ReID)取得了很大进展。然而,现有的基准数据集缺乏多样性,因此在这些数据上训练的模型在动态野外场景下的泛化能力差。为了实现提高ReID模型的显式泛化目标,我们开发了一个名为OWD的新开放世界、多样、跨时空数据集,具有多个独特的特征。1) 多样化的场景收集:包括多个独立开放世界和高度动态的场景,如街道、交叉口、购物中心等。2) 多样化的光照变化:从白天到黑夜漫长的时间段,有丰富的光照变化。3) 多样的人的状态:所有季节的多个相机网络,包括正常/恶劣的天气条件以及多样的人行道外观(例如,衣服、个人物品、姿势等)。4) 保护隐私:对于关键隐私应用的可见面。为了提高ReID的隐式泛化,我们进一步提出了一个潜在领域扩展(LDE)方法,以开发数据源的潜在能力,该方法解耦了相关域的特征,隐含地强制域随机化身份特征空间扩张,并为领域不变的表示创造更丰富的领域多样性。我们在社区中的大多数基准数据集的全面评估对于进步来说至关重要,尽管这项工作离开放世界和动态野外应用的 grand goal 还有很长的路要走。
https://arxiv.org/abs/2403.15119
Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is available at this https URL
目前,在基于外观的个体识别方法已经在均匀相机中取得了显著的进步,例如地面地面匹配。然而,作为更实际的场景,异质相机中的航空地面人物识别(AGPReID)受到了很少的关注。为了减轻由于显著的视差差异导致的区分性身份表示中断,我们提出了一个简单的但有效的框架——视解耦变压器(VDT)。 VDT有两个主要组成部分,用于解耦视相关和视无关特征。具体来说,前者在VDT内部分离这两个特征,后者则约束这两个特征相互独立。此外,我们还提出了一个名为CARGO的大规模AGPReID数据集,包括5/8个航空/地面相机,5,000个个体和108,563个图像。在两个数据集上的实验结果表明,VDT对于AGPReID是一个可行的且有效的解决方案,在CARGO数据集上比前方法提高了5.0%/2.7%的mAP/Rank1,而在AG-ReID数据集上提高了3.7%/5.2%的性能,同时保持相同的计算复杂度。我们的项目可以在这个https://url上找到。
https://arxiv.org/abs/2403.14513
Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performance, models should continue adapting to the test environment's temporal changes. Test-time adaptation (TTA), which aims to adapt models to the test environment with only unlabeled test data, is a promising way to handle this problem because TTA can adapt models instantly in the test environment. However, the previous TTA methods are designed for classification and cannot be directly applied to re-id. This is because the set of people's identities in the dataset differs between training and testing in re-id, whereas the set of classes is fixed in the current TTA methods designed for classification. To improve re-id performance in changing test environments, we propose TEst-time similarity Modification for Person re-identification (TEMP), a novel TTA method for re-id. TEMP is the first fully TTA method for re-id, which does not require any modification to pre-training. Inspired by TTA methods that refine the prediction uncertainty in classification, we aim to refine the uncertainty in re-id. However, the uncertainty cannot be computed in the same way as classification in re-id since it is an open-set task, which does not share person labels between training and testing. Hence, we propose re-id entropy, an alternative uncertainty measure for re-id computed based on the similarity between the feature vectors. Experiments show that the re-id entropy can measure the uncertainty on re-id and TEMP improves the performance of re-id in online settings where the distribution changes over time.
人物识别(RE-ID)是一种从数据库中检索相同人物图像的图像识别应用,是实践中最实用的图像识别应用之一。然而,在现实生活中,照片拍摄的环境会随着时间的推移而变化,这会导致训练和测试之间的分布转移,从而降低RE-ID的性能。为了保持RE-ID的性能,模型应继续适应测试环境的时变性。测试时间适应(TTA)是一种通过仅使用未标记测试数据来适应测试环境的方法,是解决这个问题的一个有前途的方法,因为TTA可以在测试环境中立即适应模型。然而,之前的设计为分类的TTA方法无法直接应用于RE-ID。这是因为数据集中的人名集合在训练和测试环境之间有所不同,而当前为分类设计的TTA方法中的集合是固定的。为了在变化的环境中提高RE-ID的性能,我们提出了TEst-time similarity Modification for Person re-identification(TEMP),一种新颖的RE-ID TTA方法。TEMP是第一个完全的RE-ID TTA方法,不需要对预训练进行修改。受到分类TTA方法精炼预测不确定性的启发,我们试图提高RE-ID的不确定性。然而,由于RE-ID是一个开放集任务,它不共享在训练和测试环境之间的人名标签,因此我们提出了RE-ID entropy,一种基于特征向量之间相似性计算的RE-ID alternative uncertainty measure。实验证明,RE-ID熵可以衡量RE-ID的不确定性,而TEMP的性能在在线设置中提高了RE-ID的性能,这些设置中分布会随时间变化。
https://arxiv.org/abs/2403.14114
Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at this https URL.
可见-红外人物识别(VI-ReID)是一个具有挑战性的跨模态行人检索任务,因为不同相机之间存在显著的类内差异和跨模态差异。现有工作主要集中在将不同模态的图像嵌入到一个统一的 space 中,以挖掘模态共性特征。他们仅关注这些共享特征中的显着信息,而忽略了隐含在模态特定特征中的身份意识有用信息。为了解决这个问题,我们提出了一个新颖的隐式区分性知识学习(IDKL)网络来揭示和利用模态特定特征中隐含的区分性信息。首先,我们使用一种新颖的双流网络提取模态特定和模态共性特征。然后,模态特定特征经过净化,以减少其模态风格差异,同时保留身份意识区分性知识。接下来,这种隐含知识被蒸馏到模态共性特征中,以增强其独特性。最后,提出了一种对增强模态共性特征的同步损失,以最小化模态差异。在多个公开数据集上进行的大量实验证明,IDKL网络相对于最先进的方法具有优越性。代码可在此链接处获取。
https://arxiv.org/abs/2403.11708
Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at this https URL.
为了学习同一人在不同视角下的图像之间的关联,在过去的十年里,对Person Re-identification(ReID)的研究已经得到了广泛的发展。为了克服在不同视角之间图像之间的图像差异,为了解决诸如分辨率变化、着装变化、遮挡和模态变化等问题,已经开发了大量的ReID模型的变体。尽管许多ReID变体在性能上表现出色,但这些变体通常会以独特的方式运行,并且不能应用于其他问题。据我们所知,没有一种通用的ReID模型可以同时处理各种ReID挑战。 我们的主要想法是建立一个两阶段提示为基础的双胞胎建模框架,称为VersReID。VersReID首先利用场景标签来训练一个包含丰富知识以处理各种场景的ReID银行,其中几组场景特定的提示被用于编码不同的场景特定知识。在第二阶段,我们从ReID银行中提取具有多样提示的V-支模态,用于自适应地解决不同场景的ReID,消除在推理阶段需要场景标签的需求。为了方便训练VersReID,我们还通过多场景 priori数据增强(MPDA)策略引入了多场景属性。 通过大量实验,我们证明了在不需要在推理阶段手动分配场景标签的情况下,学习一个有效且多场景的ReID模型可以成功地解决ReID任务,包括一般、低分辨率、着装变化、遮挡和跨模态场景。代码和模型可以从该链接下载。
https://arxiv.org/abs/2403.11121
A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces the Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by finding and aligning body part features extracted from both I and V modalities. Indeed, BMDG aims to reduce the modality gaps in two steps. First, it aligns modalities in feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. In particular, our method minimizes the cross-modal gap by identifying and aligning shared prototypes that capture key discriminative features across modalities, then uses multiple bridging steps based on this information to enhance the feature representation. Experiments conducted on challenging V-I ReID datasets indicate that our BMDG approach outperforms state-of-the-art part-based models or methods that generate an intermediate domain from V-I person ReID.
在可见-红外人员识别(V-I ReID)中的一个关键挑战是训练一个能够有效解决不同模态之间显著差异的主干模型。最先进的生成单个中间域的方法通常效果较差,因为生成的中间域可能不足以捕捉足够的共同区分信息。本文介绍了一种名为双向多级域泛化(BMDG)的新方法,用于统一不同模态的特征表示。BMDG通过找到并平滑从I和V模态中提取的身体部位特征,创建多个虚拟的中间域。实际上,BMDG旨在通过两个步骤减少模态差距。首先,它通过在特征空间中学习共享的和与模态无关的身体部位原型来对模态进行对齐。然后,它通过双向多级学习逐步优化每个步骤的特征表示,并从两个模态中包括更多的原型。特别地,我们的方法通过识别和归一化捕捉关键区分特征的共享原型,然后根据这些信息使用多个桥接步骤来增强特征表示。在具有挑战性的V-I ReID数据集上进行的实验表明,我们的BMDG方法优于最先进的部分基于模型或从V-I人员ReID中生成中间域的方法。
https://arxiv.org/abs/2403.10782
Lifelong person re-identification (LReID) assumes a practical scenario where the model is sequentially trained on continuously incoming datasets while alleviating the catastrophic forgetting in the old datasets. However, not only the training datasets but also the gallery images are incrementally accumulated, that requires a huge amount of computational complexity and storage space to extract the features at the inference phase. In this paper, we address the above mentioned problem by incorporating the backward-compatibility to LReID for the first time. We train the model using the continuously incoming datasets while maintaining the model's compatibility toward the previously trained old models without re-computing the features of the old gallery images. To this end, we devise the cross-model compatibility loss based on the contrastive learning with respect to the replay features across all the old datasets. Moreover, we also develop the knowledge consolidation method based on the part classification to learn the shared representation across different datasets for the backward-compatibility. We suggest a more practical methodology for performance evaluation as well where all the gallery and query images are considered together. Experimental results demonstrate that the proposed method achieves a significantly higher performance of the backward-compatibility compared with the existing methods. It is a promising tool for more practical scenarios of LReID.
终身人物识别(LReID)假定一个实际场景,即在连续的 incoming 数据集中对模型进行序列训练,同时减轻 old 数据集中的灾难性遗忘。然而,不仅训练数据集,还包括画廊图像,都需要积累大量的计算复杂度和存储空间,在推理阶段提取特征。在本文中,我们通过首次将反向兼容性引入 LReID,解决了上述提到的这个问题。我们在连续的 incoming 数据集上训练模型,同时保持模型对之前训练的旧模型的兼容性,而不重新计算旧画廊图像的特征。为此,我们根据所有 old 数据集的对比学习,设计了一种跨模态兼容性损失。此外,我们还基于部分分类开发了知识整合方法,以学习不同数据集之间的共享表示。我们建议一种更实际的性能评估方法,其中所有画廊和查询图像都被考虑在内。实验结果表明,与现有方法相比,所提出的方法在反向兼容性方面取得了显著的提高。这是一个有前景的工具,适用于更实际的 LReID 场景。
https://arxiv.org/abs/2403.10022
Cloth-changing person re-identification aims to retrieve and identify spe-cific pedestrians by using cloth-irrelevant features in person cloth-changing scenarios. However, pedestrian images captured by surveillance probes usually contain occlusions in real-world scenarios. The perfor-mance of existing cloth-changing re-identification methods is significantly degraded due to the reduction of discriminative cloth-irrelevant features caused by occlusion. We define cloth-changing person re-identification in occlusion scenarios as occluded cloth-changing person re-identification (Occ-CC-ReID), and to the best of our knowledge, we are the first to pro-pose occluded cloth-changing person re-identification as a new task. We constructed two occluded cloth-changing person re-identification datasets for different occlusion scenarios: Occluded-PRCC and Occluded-LTCC. The datasets can be obtained from the following link: this https URL Re-Identification.
衣物换人识别旨在通过在人员更衣场景中使用与衣物无关的特征来检索和识别特定的行人。然而,由遮挡导致的现实场景中行人图像通常包含遮挡。由于遮挡导致歧视性衣物无关特征的减少,现有衣物换人识别方法的性能 significantly degraded。我们将遮挡场景中的衣物换人识别定义为遮挡衣物换人识别(Occ-CC-ReID),据我们所知,我们第一个提出了遮挡衣物换人识别作为新任务。我们为不同遮挡场景构建了两个遮挡衣物换人识别数据集:Occluded-PRCC和Occluded-LTCC。数据集可以通过以下链接获取:https://this.url。
https://arxiv.org/abs/2403.08557
Cloth-Changing Person Re-Identification (CC-ReID) aims to accurately identify the target person in more realistic surveillance scenarios, where pedestrians usually change their clothing. Despite great progress, limited cloth-changing training samples in existing CC-ReID datasets still prevent the model from adequately learning cloth-irrelevant features. In addition, due to the absence of explicit supervision to keep the model constantly focused on cloth-irrelevant areas, existing methods are still hampered by the disruption of clothing variations. To solve the above issues, we propose an Identity-aware Dual-constraint Network (IDNet) for the CC-ReID task. Specifically, to help the model extract cloth-irrelevant clues, we propose a Clothes Diversity Augmentation (CDA), which generates more realistic cloth-changing samples by enriching the clothing color while preserving the texture. In addition, a Multi-scale Constraint Block (MCB) is designed, which extracts fine-grained identity-related features and effectively transfers cloth-irrelevant knowledge. Moreover, a Counterfactual-guided Attention Module (CAM) is presented, which learns cloth-irrelevant features from channel and space dimensions and utilizes the counterfactual intervention for supervising the attention map to highlight identity-related regions. Finally, a Semantic Alignment Constraint (SAC) is designed to facilitate high-level semantic feature interaction. Comprehensive experiments on four CC-ReID datasets indicate that our method outperforms prior state-of-the-art approaches.
衣物变化人员重新识别(CC-ReID)的目的是在更真实的监视场景中准确地识别目标人物,其中行人通常会改变衣服。尽管取得了很大进展,但现有的CC-ReID数据集中有限的换衣训练样本还是无法使模型充分学习与衣物无关的特征。此外,由于缺乏明确监督来保持模型持续关注衣物无关区域,现有方法仍然受到服装变化的影响。为解决上述问题,我们提出了一个具有身份感的学习双约束网络(IDNet)用于CC-ReID任务。具体来说,为了帮助模型提取与衣物无关的线索,我们提出了一个衣物多样性增强(CDA),它通过增加衣服颜色的同时保留纹理来生成更逼真的换衣样本。此外,还设计了一个多尺度约束块(MCB),它提取细粒度的身份相关特征,并有效地将衣物无关知识转移。此外,我们还提出了一个反事实引导的注意力模块(CAM),它从通道和空间维度学习衣物无关特征,并利用反事实干预来监督注意力图以突出身份相关区域。最后,我们还设计了一个语义对齐约束(SAC)来促进高级语义特征交互。对四个CC-ReID数据集的全面实验表明,我们的方法超越了先前最先进的解决方案。
https://arxiv.org/abs/2403.08270
Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is then proposed to infer the aggregation. Furthermore, the cross-modal version of k-reciprocal re-ranking algorithm is investigated, named CMKR, in which three neighbor search strategies and one local query expansion method are explored to alleviate the modality bias problem of the near neighbors. We conduct experiments on SYSU-MM01, RegDB and LLCM datasets with our manually annotated descriptions. Both YYDS and CMKR achieve remarkable improvements over SOTA methods on all three datasets. Codes are available at this https URL.
可见-红外人员识别(VI-ReID)挑战较大,因为存在显著的跨模态差异。现有工作主要集中在通过抑制模态特定特征来学习模态无关特征,然而仅从红外样本中检索可见图像是一个极端问题,因为缺少颜色信息。为此,我们提出了Refer-VI-ReID设置,旨在将来自红外图像的目标可见图像和粗语言描述(例如"一个穿着红色上衣和黑色裤子的男人")进行匹配,以补充缺失的颜色信息。为解决此任务,我们设计了一个Y-Y-形状的分解结构,称之为YYDS,以分解和聚合目标的纹理和颜色特征。具体来说,我们首先提出了文本IoU正则化策略来促进分解训练,然后提出了联合关系模块来推断聚合。此外,我们还研究了k-互推重排算法的跨模态版本,名为CMKR,其中采用了三种邻居搜索策略和一种局部查询扩展方法来减轻近邻模态偏差问题。我们在SYSU-MM01、RegDB和LLVM数据集上进行手动注释的实验。所有设置都取得了显著的提高,超过了当前最先进的方法。代码可在此链接处获取:https://www.xxx
https://arxiv.org/abs/2403.04183
This work addresses the task of long-term person re-identification. Typically, person re-identification assumes that people do not change their clothes, which limits its applications to short-term scenarios. To overcome this limitation, we investigate long-term person re-identification, which considers both clothes-changing and clothes-consistent scenarios. In this paper, we propose a novel framework that effectively learns and utilizes both global and local information. The proposed framework consists of three streams: global, local body part, and head streams. The global and head streams encode identity-relevant information from an entire image and a cropped image of the head region, respectively. Both streams encode the most distinct, less distinct, and average features using the combinations of adversarial erasing, max pooling, and average pooling. The local body part stream extracts identity-related information for each body part, allowing it to be compared with the same body part from another image. Since body part annotations are not available in re-identification datasets, pseudo-labels are generated using clustering. These labels are then utilized to train a body part segmentation head in the local body part stream. The proposed framework is trained by backpropagating the weighted summation of the identity classification loss, the pair-based loss, and the pseudo body part segmentation loss. To demonstrate the effectiveness of the proposed method, we conducted experiments on three publicly available datasets (Celeb-reID, PRCC, and VC-Clothes). The experimental results demonstrate that the proposed method outperforms the previous state-of-the-art method.
本文解决了长期人物识别(long-term person re-identification)的任务。通常,人物识别假设人们不改变衣服,这限制了其应用于短期场景。为了克服这一限制,我们研究了长期人物识别,考虑了换衣服和换衣服一致的情况。在本文中,我们提出了一个新颖的框架,有效地学习和利用了全局和局部信息。该框架包括三个流:全局流、局部身体部分流和头流。全局和头流分别从整个图像和头部裁剪图像中编码身份相关信息。这两条流使用组合的对抗性消除、最大池化和平均池化来编码最明显的、不太明显的和平均的特征。局部身体部分流提取与每个身体部分相关的身份信息,使得它可以与另一个图像中的相同身体部分进行比较。由于身份标注在识别数据集中不存在,因此通过聚类生成伪标签。这些伪标签随后用于在局部身体部分流中训练身体部分分割头。所提出的框架通过反向传播全局身份分类损失、基于一对的损失和伪身体部分分割损失的加权求和进行训练。为了证明所提出方法的有效性,我们在三个公开可用数据集(Celeb-reID、PRCC和VC-Clothes)上进行了实验。实验结果表明,与以前的最先进方法相比,所提出的方法具有优越性能。
https://arxiv.org/abs/2403.02892
Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features restricts the use of metric learning, thus affecting the effectiveness of part-based methods. The two issues mentioned above result in the under-utilization of part features in part-based methods. We introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method to address these challenges. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the foreground omissions and spatial confusions issues in the previous method. Then, we propose foreground and space corrections to enhance the completeness and reasonableness of the human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, which enables better utilization of both global and part features. Extensive experiments on Market-1501 and MSMT17 validate the proposed method's effectiveness over many state-of-the-art methods.
最近,无监督的人重新识别(Re-ID)方法通过利用细粒度局部上下文取得了高性能。这些方法被称为基于部分的(part-based)方法。然而,大多数基于部分的方法通过水平分割获得局部上下文,这会导致因为各种人体姿势而产生的错位。此外,部分特征中的语义信息错位限制了使用指标学习,从而影响了基于部分的方法的有效性。上述两个问题导致基于部分的方法中部分特征的利用率较低。我们引入了空间级联聚类和加权记忆(SCWM)方法来解决这些问题。SCWM旨在解析和校准不同人体部位更准确的局部上下文,同时允许记忆模块平衡难样本挖掘和噪声抑制。具体来说,我们首先分析了前方法中的前景缺失和空间混淆问题。然后,我们提出了前景和空间修正来提高人类解析结果的完整性和合理性。接下来,我们引入了加权记忆,并利用了两种加权策略。这些策略解决了全局特征的难样本挖掘问题,并提高了部分特征的噪声抵抗能力,从而更好地利用全局和部分特征。在Market-1501和MSMT17等大量实验中,我们验证了所提出方法的有效性超过了许多最先进的method。
https://arxiv.org/abs/2403.00261
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotation, and vice versa. USVI-ReID is a challenging yet under-explored task. Most existing methods address the USVI-ReID problem using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person. However, the cluster center primarily focuses on shared information, overlooking disparity. To address the problem, we propose a Progressive Contrastive Learning with Multi-Prototype (PCLMP) method for USVI-ReID. In brief, we first generate the hard prototype by selecting the sample with the maximum distance from the cluster center. This hard prototype is used in the contrastive loss to emphasize disparity. Additionally, instead of rigidly aligning query images to a specific prototype, we generate the dynamic prototype by randomly picking samples within a cluster. This dynamic prototype is used to retain the natural variety of features while reducing instability in the simultaneous learning of both common and disparate information. Finally, we introduce a progressive learning strategy to gradually shift the model's attention towards hard samples, avoiding cluster deterioration. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets validate the effectiveness of the proposed method. PCLMP outperforms the existing state-of-the-art method with an average mAP improvement of 3.9%. The source codes will be released.
无监督可见-红外人员识别(USVI-ReID)旨在将红外图像中指定的个人与可见图像中的个人进行匹配,反之亦然。USVI-ReID是一个具有挑战性但尚未被充分探索的任务。现有的方法主要通过基于聚类的对比学习来解决USVI-ReID问题,这简单地使用聚类中心作为一个人的表示。然而,聚类中心主要关注共享信息,忽视了差异。为了解决这个问题,我们提出了一个渐进式对比学习多原型(PCLMP)方法来解决USVI-ReID问题。简而言之,我们首先通过选择距离聚类中心最远的样本生成硬原型。这个硬原型用于对比损失,强调差异。此外,我们通过随机选择聚类内的样本生成动态原型。这个动态原型用于保留特征的自然多样性的同时,减少同时学习共同信息和差异信息的不稳定性。最后,我们引入了渐进学习策略,逐渐将模型的注意力从普通样本转移到困难样本,避免聚类恶化。在公开可用的大规模数据集SYSU-MM01和RegDB上进行的大量实验证实了所提出方法的有效性。PCLMP平均mAP提高了3.9%。源代码将发布。
https://arxiv.org/abs/2402.19026
Online Unsupervised Domain Adaptation (OUDA) for person Re-Identification (Re-ID) is the task of continuously adapting a model trained on a well-annotated source domain dataset to a target domain observed as a data stream. In OUDA, person Re-ID models face two main challenges: catastrophic forgetting and domain shift. In this work, we propose a new Source-guided Similarity Preservation (S2P) framework to alleviate these two problems. Our framework is based on the extraction of a support set composed of source images that maximizes the similarity with the target data. This support set is used to identify feature similarities that must be preserved during the learning process. S2P can incorporate multiple existing UDA methods to mitigate catastrophic forgetting. Our experiments show that S2P outperforms previous state-of-the-art methods on multiple real-to-real and synthetic-to-real challenging OUDA benchmarks.
在OUDA中,在线无监督领域适应(OUDA)对人物识别(Re-ID)的任务是对一个在良好注释的源域数据集上训练的模型,将其应用于观察到的目标域数据流中。在OUDA中,人物Re-ID模型面临着两个主要挑战:灾难性遗忘和领域转移。在这项工作中,我们提出了一种新的源指导相似性保留(S2P)框架来缓解这两个问题。我们的框架基于从源图像中提取一个支持集,该支持集与目标数据具有最大相似性。这个支持集用于在学习过程中保留必须保留的特征相似性。S2P可以结合多个现有的UDA方法来减轻灾难性遗忘。我们的实验结果表明,S2P在多个真实与真实和合成与真实具有挑战性的OUDA基准上超过了最先进的水平。
https://arxiv.org/abs/2402.15206
Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Training a robust LRe-ID model requires a wide range of clothing variations and expensive cloth labeling, which is lacked in current LRe-ID datasets. To address this, we perform clothing and pose transfer across identities to generate images of more clothing variations and of different persons wearing similar clothing. The augmented batch of images serve as inputs to our proposed Fine-grained Contrastive Losses, which not only supervise the Re-ID model to learn discriminative person embeddings under long-term scenarios but also ensure in-distribution data generation. Results on LRe-ID datasets demonstrate the effectiveness of our CCPA framework.
长期人物识别(LRe-ID)旨在在长时间内匹配单个个体,展示衣物、姿势和视角的差异。在这项工作中,我们提出了CCPA:对比性服装和姿势增强框架用于LRe-ID。除了外观,CCPA通过使用关系图注意力网络捕捉身体形状信息,这是 cloth-invariant 的。训练一个稳健的LRe-ID模型需要广泛的服装变化和昂贵的布料标注,这在当前的LRe-ID数据集中是缺乏的。为了解决这个问题,我们在个体之间进行服装和姿势转移,生成更多服装变化和穿着类似服装不同人物的图像。增强的批片图像作为我们提出的细粒度对比损失的输入,不仅监督重新识别模型在长期场景下学习具有区分性的个体嵌入,而且还确保同分布数据的生成。在LRe-ID数据集上的结果证明了我们的CCPA框架的有效性。
https://arxiv.org/abs/2402.14454
Person re-identification (re-ID) continues to pose a significant challenge, particularly in scenarios involving occlusions. Prior approaches aimed at tackling occlusions have predominantly focused on aligning physical body features through the utilization of external semantic cues. However, these methods tend to be intricate and susceptible to noise. To address the aforementioned challenges, we present an innovative end-to-end solution known as the Dynamic Patch-aware Enrichment Transformer (DPEFormer). This model effectively distinguishes human body information from occlusions automatically and dynamically, eliminating the need for external detectors or precise image alignment. Specifically, we introduce a dynamic patch token selection module (DPSM). DPSM utilizes a label-guided proxy token as an intermediary to identify informative occlusion-free tokens. These tokens are then selected for deriving subsequent local part features. To facilitate the seamless integration of global classification features with the finely detailed local features selected by DPSM, we introduce a novel feature blending module (FBM). FBM enhances feature representation through the complementary nature of information and the exploitation of part diversity. Furthermore, to ensure that DPSM and the entire DPEFormer can effectively learn with only identity labels, we also propose a Realistic Occlusion Augmentation (ROA) strategy. This strategy leverages the recent advances in the Segment Anything Model (SAM). As a result, it generates occlusion images that closely resemble real-world occlusions, greatly enhancing the subsequent contrastive learning process. Experiments on occluded and holistic re-ID benchmarks signify a substantial advancement of DPEFormer over existing state-of-the-art approaches. The code will be made publicly available.
人员识别(Re-ID)继续成为一个显著的挑战,尤其是在遮挡场景中。针对遮挡的方法主要集中在利用外部语义线索对物理身体特征进行对齐。然而,这些方法往往较为复杂且容易受到噪声的影响。为了应对上述挑战,我们提出了一个创新的全端解决方案,称为动态补丁感知增强Transformer(DPEFormer)。该模型能自动且动态地区分人体信息和遮挡,无需外部检测器或精确的图像对齐。具体来说,我们引入了一个动态补丁标记选择模块(DPSM)。DPSM利用标签指导的代理标记作为中间层来识别有用的遮挡无标记的token。这些标记然后用于提取后续的局部部分特征。为了使全局分类特征与由DPSM选择的细小局部特征无缝集成,我们引入了一种新颖的特征混合模块(FBM)。FBM通过信息互补性和部分多样性的利用来增强特征表示。此外,为了确保DPSM和整个DPEFormer仅通过身份标签有效地学习,我们还提出了一个现实主义遮挡增强(ROA)策略。通过利用最近在Segment Anything Model(SAM)上的进展,该策略生成了与现实世界遮挡密切相关的遮挡图像,极大地提高了后续的对比学习过程。在遮挡和整体Re-ID基准测试上进行的实验证明,DPEFormer在现有技术水平上取得了显著的进步。代码将公开发布。
https://arxiv.org/abs/2402.10435
Despite significant progress in optical character recognition (OCR) and computer vision systems, robustly recognizing text and identifying people in images taken in unconstrained \emph{in-the-wild} environments remain an ongoing challenge. However, such obstacles must be overcome in practical applications of vision systems, such as identifying racers in photos taken during off-road racing events. To this end, we introduce two new challenging real-world datasets - the off-road motorcycle Racer Number Dataset (RND) and the Muddy Racer re-iDentification Dataset (MUDD) - to highlight the shortcomings of current methods and drive advances in OCR and person re-identification (ReID) under extreme conditions. These two datasets feature over 6,300 images taken during off-road competitions which exhibit a variety of factors that undermine even modern vision systems, namely mud, complex poses, and motion blur. We establish benchmark performance on both datasets using state-of-the-art models. Off-the-shelf models transfer poorly, reaching only 15% end-to-end (E2E) F1 score on text spotting, and 33% rank-1 accuracy on ReID. Fine-tuning yields major improvements, bringing model performance to 53% F1 score for E2E text spotting and 79% rank-1 accuracy on ReID, but still falls short of good performance. Our analysis exposes open problems in real-world OCR and ReID that necessitate domain-targeted techniques. With these datasets and analysis of model limitations, we aim to foster innovations in handling real-world conditions like mud and complex poses to drive progress in robust computer vision. All data was sourced from this http URL, a website used by professional motorsports photographers, racers, and fans. The top-performing text spotting and ReID models are deployed on this platform to power real-time race photo search.
尽管在光学字符识别(OCR)和计算机视觉系统方面取得了显著的进展,但在不受约束的野外环境中准确识别文本和识别人物仍然是一个持续的挑战。然而,在视觉系统的实际应用中, such 障碍必须被克服,例如在赛车照片中识别赛车手。为此,我们引入了两个新的具有挑战性的现实世界数据集——赛车手编号数据集(RND)和泥泞赛车手重新识别数据集(MUDD),以强调在极端条件下 OCR 和人物识别(ReID)方法的不足之处,推动在不受约束的环境中实现更好的识别性能。这两个数据集涵盖了在赛车比赛中拍摄的超过 6,300 张图像,这些图像呈现出各种因素,对即使是最先进的现代视觉系统也会产生影响,例如泥、复杂的姿势和运动模糊。我们在两个数据集上使用最先进的模型进行基准性能评估。通用的模型转移差,仅达到15%的端到端(E2E) F1 分数在文本检测中,而在 ReID 方面也只有33%的排名一准确率。微调带来重大改进,将模型的性能提高到53%的E2E文本检测和79%的排名一准确率在 ReID 上,但仍然存在不足。我们的分析揭示了在现实世界 OCR 和 ReID 中需要解决的问题,这需要领域特定的技术。有了这些数据集和模型限制的分析,我们旨在推动在处理类似泥和复杂姿态的实时情况方面的创新,以推动计算机视觉在实时情况下的进步。所有数据都来自这个链接,这是一个专业赛车摄影师、赛车手和粉丝使用的网站。在这个平台上,最优秀的文本检测和 ReID 模型部署用于实时赛车照片搜索。
https://arxiv.org/abs/2402.08025
The acquisition of large-scale, precisely labeled datasets for person re-identification (ReID) poses a significant challenge. Weakly supervised ReID has begun to address this issue, although its performance lags behind fully supervised methods. In response, we introduce Contrastive Multiple Instance Learning (CMIL), a novel framework tailored for more effective weakly supervised ReID. CMIL distinguishes itself by requiring only a single model and no pseudo labels while leveraging contrastive losses -- a technique that has significantly enhanced traditional ReID performance yet is absent in all prior MIL-based approaches. Through extensive experiments and analysis across three datasets, CMIL not only matches state-of-the-art performance on the large-scale SYSU-30k dataset with fewer assumptions but also consistently outperforms all baselines on the WL-market1501 and Weakly Labeled MUddy racer re-iDentification dataset (WL-MUDD) datasets. We introduce and release the WL-MUDD dataset, an extension of the MUDD dataset featuring naturally occurring weak labels from the real-world application at this http URL. All our code and data are accessible at this https URL.
大规模、精确标注的数据集用于人物识别(ReID)的收购是一个重大挑战。尽管弱监督ReID已经开始解决这个问题,但它的性能仍然落后于完全监督方法。为了应对这个问题,我们引入了 Contrastive Multiple Instance Learning (CMIL),一种专门针对弱监督ReID的新框架。CMIL通过仅要求一个模型和不需要伪标签,并利用对比损失技术脱颖而出 - 这种技术在所有先前的MIL基于方法中都是缺失的。通过三个数据集的大量实验和分析,CMIL不仅在大型SYSU-30k数据集上与最先进的性能相匹敌,而且 consistently在WL-market1501和WL-Labeled MUddy racer re-identification dataset (WL-MUDD)数据集上优于所有基线。我们介绍并发布了WL-MUDD数据集,这是MUDD数据集的扩展,来自真实的应用场景的自然弱标签。所有我们的代码和数据都可以通过这个链接访问:https://www.example.com/
https://arxiv.org/abs/2402.07685