Person re-identification (re-id), which aims to retrieve images of the same person in a given image from a database, is one of the most practical image recognition applications. In the real world, however, the environments that the images are taken from change over time. This causes a distribution shift between training and testing and degrades the performance of re-id. To maintain re-id performance, models should continue adapting to the test environment's temporal changes. Test-time adaptation (TTA), which aims to adapt models to the test environment with only unlabeled test data, is a promising way to handle this problem because TTA can adapt models instantly in the test environment. However, the previous TTA methods are designed for classification and cannot be directly applied to re-id. This is because the set of people's identities in the dataset differs between training and testing in re-id, whereas the set of classes is fixed in the current TTA methods designed for classification. To improve re-id performance in changing test environments, we propose TEst-time similarity Modification for Person re-identification (TEMP), a novel TTA method for re-id. TEMP is the first fully TTA method for re-id, which does not require any modification to pre-training. Inspired by TTA methods that refine the prediction uncertainty in classification, we aim to refine the uncertainty in re-id. However, the uncertainty cannot be computed in the same way as classification in re-id since it is an open-set task, which does not share person labels between training and testing. Hence, we propose re-id entropy, an alternative uncertainty measure for re-id computed based on the similarity between the feature vectors. Experiments show that the re-id entropy can measure the uncertainty on re-id and TEMP improves the performance of re-id in online settings where the distribution changes over time.
人物识别(RE-ID)是一种从数据库中检索相同人物图像的图像识别应用,是实践中最实用的图像识别应用之一。然而,在现实生活中,照片拍摄的环境会随着时间的推移而变化,这会导致训练和测试之间的分布转移,从而降低RE-ID的性能。为了保持RE-ID的性能,模型应继续适应测试环境的时变性。测试时间适应(TTA)是一种通过仅使用未标记测试数据来适应测试环境的方法,是解决这个问题的一个有前途的方法,因为TTA可以在测试环境中立即适应模型。然而,之前的设计为分类的TTA方法无法直接应用于RE-ID。这是因为数据集中的人名集合在训练和测试环境之间有所不同,而当前为分类设计的TTA方法中的集合是固定的。为了在变化的环境中提高RE-ID的性能,我们提出了TEst-time similarity Modification for Person re-identification(TEMP),一种新颖的RE-ID TTA方法。TEMP是第一个完全的RE-ID TTA方法,不需要对预训练进行修改。受到分类TTA方法精炼预测不确定性的启发,我们试图提高RE-ID的不确定性。然而,由于RE-ID是一个开放集任务,它不共享在训练和测试环境之间的人名标签,因此我们提出了RE-ID entropy,一种基于特征向量之间相似性计算的RE-ID alternative uncertainty measure。实验证明,RE-ID熵可以衡量RE-ID的不确定性,而TEMP的性能在在线设置中提高了RE-ID的性能,这些设置中分布会随时间变化。
https://arxiv.org/abs/2403.14114
We study the task of 3D multi-object re-identification from embodied tours. Specifically, an agent is given two tours of an environment (e.g. an apartment) under two different layouts (e.g. arrangements of furniture). Its task is to detect and re-identify objects in 3D - e.g. a "sofa" moved from location A to B, a new "chair" in the second layout at location C, or a "lamp" from location D in the first layout missing in the second. To support this task, we create an automated infrastructure to generate paired egocentric tours of initial/modified layouts in the Habitat simulator using Matterport3D scenes, YCB and Google-scanned objects. We present 3D Semantic MapNet (3D-SMNet) - a two-stage re-identification model consisting of (1) a 3D object detector that operates on RGB-D videos with known pose, and (2) a differentiable object matching module that solves correspondence estimation between two sets of 3D bounding boxes. Overall, 3D-SMNet builds object-based maps of each layout and then uses a differentiable matcher to re-identify objects across the tours. After training 3D-SMNet on our generated episodes, we demonstrate zero-shot transfer to real-world rearrangement scenarios by instantiating our task in Replica, Active Vision, and RIO environments depicting rearrangements. On all datasets, we find 3D-SMNet outperforms competitive baselines. Further, we show jointly training on real and generated episodes can lead to significant improvements over training on real data alone.
我们研究了从 embodied 导览中学习 3D 多对象识别的任务。具体来说,一个代理被给予两个环境(例如公寓)的不同布局(例如家具排列)。它的任务是检测和识别 3D 中的对象 - 例如从位置 A 到位置 B 的“沙发”,位置 C 的第二个布局中的新“椅子”,或者在第一个布局中缺失的位置 D 的“灯”。为了支持这项任务,我们创建了一个自动化的基础设施,使用Matterport3D 场景、YCB 和 Google-扫描的对象生成初始/修改布局的对称形导览。我们展示了 3D 语义图网络(3D-SMNet),这是一种由两个阶段组成的识别模型,其第一阶段是一个在已知姿态的 RGB-D 视频上运行的 3D 物体检测器,第二阶段是一个用于解决两个 3D 边界框之间对应关系的不同可导模块。总的来说,3D-SMNet 构建了每个布局的物体基础映射,然后使用可导匹配器在导览之间重新识别物体。在用我们的生成任务训练 3D-SMNet 后,我们在 Replica、Active Vision 和 RIO 等环境中通过实例展示了零散转移到现实世界的重新排列场景。在所有数据集中,我们发现 3D-SMNet 都优于竞争基线。此外,我们还证明了在真实和生成任务上共同训练可以带来在训练仅基于真实数据时的显著改进。
https://arxiv.org/abs/2403.13190
In numerous studies, deep learning algorithms have proven their potential for the analysis of histopathology images, for example, for revealing the subtypes of tumors or the primary origin of metastases. These models require large datasets for training, which must be anonymized to prevent possible patient identity leaks. This study demonstrates that even relatively simple deep learning algorithms can re-identify patients in large histopathology datasets with substantial accuracy. We evaluated our algorithms on two TCIA datasets including lung squamous cell carcinoma (LSCC) and lung adenocarcinoma (LUAD). We also demonstrate the algorithm's performance on an in-house dataset of meningioma tissue. We predicted the source patient of a slide with F1 scores of 50.16 % and 52.30 % on the LSCC and LUAD datasets, respectively, and with 62.31 % on our meningioma dataset. Based on our findings, we formulated a risk assessment scheme to estimate the risk to the patient's privacy prior to publication.
在许多研究中,深度学习算法已经证明了其在病理学图像分析中的潜力,例如,揭示肿瘤亚型或转移灶的原始来源。这些模型需要大量的数据集进行训练,为了防止可能的患者身份泄露,这些数据集必须匿名化。这项研究展示了即使是相对简单的深度学习算法,也可以在大型病理学数据集中准确地重新识别患者。我们在两个TCIA数据集上评估了我们的算法,包括肺鳞状细胞癌(LSCC)和肺腺癌(LUAD)。我们还将在本体内膜组织数据集上评估算法的性能。我们预测了LSCC和LUAD数据集中的幻灯片来源患者的F1分数分别为50.16%和52.30%,而在本体内膜组织数据集上的分数为62.31%。根据我们的研究结果,我们制定了一个风险评估方案,以估计在发表前对患者隐私的风险。
https://arxiv.org/abs/2403.12816
Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at this https URL.
可见-红外人物识别(VI-ReID)是一个具有挑战性的跨模态行人检索任务,因为不同相机之间存在显著的类内差异和跨模态差异。现有工作主要集中在将不同模态的图像嵌入到一个统一的 space 中,以挖掘模态共性特征。他们仅关注这些共享特征中的显着信息,而忽略了隐含在模态特定特征中的身份意识有用信息。为了解决这个问题,我们提出了一个新颖的隐式区分性知识学习(IDKL)网络来揭示和利用模态特定特征中隐含的区分性信息。首先,我们使用一种新颖的双流网络提取模态特定和模态共性特征。然后,模态特定特征经过净化,以减少其模态风格差异,同时保留身份意识区分性知识。接下来,这种隐含知识被蒸馏到模态共性特征中,以增强其独特性。最后,提出了一种对增强模态共性特征的同步损失,以最小化模态差异。在多个公开数据集上进行的大量实验证明,IDKL网络相对于最先进的方法具有优越性。代码可在此链接处获取。
https://arxiv.org/abs/2403.11708
Person Re-identification (ReID) has been extensively developed for a decade in order to learn the association of images of the same person across non-overlapping camera views. To overcome significant variations between images across camera views, mountains of variants of ReID models were developed for solving a number of challenges, such as resolution change, clothing change, occlusion, modality change, and so on. Despite the impressive performance of many ReID variants, these variants typically function distinctly and cannot be applied to other challenges. To our best knowledge, there is no versatile ReID model that can handle various ReID challenges at the same time. This work contributes to the first attempt at learning a versatile ReID model to solve such a problem. Our main idea is to form a two-stage prompt-based twin modeling framework called VersReID. Our VersReID firstly leverages the scene label to train a ReID Bank that contains abundant knowledge for handling various scenes, where several groups of scene-specific prompts are used to encode different scene-specific knowledge. In the second stage, we distill a V-Branch model with versatile prompts from the ReID Bank for adaptively solving the ReID of different scenes, eliminating the demand for scene labels during the inference stage. To facilitate training VersReID, we further introduce the multi-scene properties into self-supervised learning of ReID via a multi-scene prioris data augmentation (MPDA) strategy. Through extensive experiments, we demonstrate the success of learning an effective and versatile ReID model for handling ReID tasks under multi-scene conditions without manual assignment of scene labels in the inference stage, including general, low-resolution, clothing change, occlusion, and cross-modality scenes. Codes and models are available at this https URL.
为了学习同一人在不同视角下的图像之间的关联,在过去的十年里,对Person Re-identification(ReID)的研究已经得到了广泛的发展。为了克服在不同视角之间图像之间的图像差异,为了解决诸如分辨率变化、着装变化、遮挡和模态变化等问题,已经开发了大量的ReID模型的变体。尽管许多ReID变体在性能上表现出色,但这些变体通常会以独特的方式运行,并且不能应用于其他问题。据我们所知,没有一种通用的ReID模型可以同时处理各种ReID挑战。 我们的主要想法是建立一个两阶段提示为基础的双胞胎建模框架,称为VersReID。VersReID首先利用场景标签来训练一个包含丰富知识以处理各种场景的ReID银行,其中几组场景特定的提示被用于编码不同的场景特定知识。在第二阶段,我们从ReID银行中提取具有多样提示的V-支模态,用于自适应地解决不同场景的ReID,消除在推理阶段需要场景标签的需求。为了方便训练VersReID,我们还通过多场景 priori数据增强(MPDA)策略引入了多场景属性。 通过大量实验,我们证明了在不需要在推理阶段手动分配场景标签的情况下,学习一个有效且多场景的ReID模型可以成功地解决ReID任务,包括一般、低分辨率、着装变化、遮挡和跨模态场景。代码和模型可以从该链接下载。
https://arxiv.org/abs/2403.11121
A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces the Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by finding and aligning body part features extracted from both I and V modalities. Indeed, BMDG aims to reduce the modality gaps in two steps. First, it aligns modalities in feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. In particular, our method minimizes the cross-modal gap by identifying and aligning shared prototypes that capture key discriminative features across modalities, then uses multiple bridging steps based on this information to enhance the feature representation. Experiments conducted on challenging V-I ReID datasets indicate that our BMDG approach outperforms state-of-the-art part-based models or methods that generate an intermediate domain from V-I person ReID.
在可见-红外人员识别(V-I ReID)中的一个关键挑战是训练一个能够有效解决不同模态之间显著差异的主干模型。最先进的生成单个中间域的方法通常效果较差,因为生成的中间域可能不足以捕捉足够的共同区分信息。本文介绍了一种名为双向多级域泛化(BMDG)的新方法,用于统一不同模态的特征表示。BMDG通过找到并平滑从I和V模态中提取的身体部位特征,创建多个虚拟的中间域。实际上,BMDG旨在通过两个步骤减少模态差距。首先,它通过在特征空间中学习共享的和与模态无关的身体部位原型来对模态进行对齐。然后,它通过双向多级学习逐步优化每个步骤的特征表示,并从两个模态中包括更多的原型。特别地,我们的方法通过识别和归一化捕捉关键区分特征的共享原型,然后根据这些信息使用多个桥接步骤来增强特征表示。在具有挑战性的V-I ReID数据集上进行的实验表明,我们的BMDG方法优于最先进的部分基于模型或从V-I人员ReID中生成中间域的方法。
https://arxiv.org/abs/2403.10782
Single-modal object re-identification (ReID) faces great challenges in maintaining robustness within complex visual scenarios. In contrast, multi-modal object ReID utilizes complementary information from diverse modalities, showing great potentials for practical applications. However, previous methods may be easily affected by irrelevant backgrounds and usually ignore the modality gaps. To address above issues, we propose a novel learning framework named \textbf{EDITOR} to select diverse tokens from vision Transformers for multi-modal object ReID. We begin with a shared vision Transformer to extract tokenized features from different input modalities. Then, we introduce a Spatial-Frequency Token Selection (SFTS) module to adaptively select object-centric tokens with both spatial and frequency information. Afterwards, we employ a Hierarchical Masked Aggregation (HMA) module to facilitate feature interactions within and across modalities. Finally, to further reduce the effect of backgrounds, we propose a Background Consistency Constraint (BCC) and an Object-Centric Feature Refinement (OCFR). They are formulated as two new loss functions, which improve the feature discrimination with background suppression. As a result, our framework can generate more discriminative features for multi-modal object ReID. Extensive experiments on three multi-modal ReID benchmarks verify the effectiveness of our methods. The code is available at this https URL.
单模态物体识别(ReID)在复杂的视觉场景中面临很大的挑战。相比之下,多模态物体ReID利用来自不同模态的互补信息,具有很大的实际应用潜力。然而,之前的方法可能会受到无关背景的影响,通常会忽略模态差距。为解决上述问题,我们提出了一个名为《编辑器》(Editor)的新学习框架,用于从视觉Transformer中选择多样化的标记。我们首先使用共享的视觉Transformer提取不同输入模态的标记。然后,我们引入了一个空间频率标记选择(SFTS)模块,以适应选择具有空间和频率信息的物体中心标记。接下来,我们使用层次结构掩码聚合(HMA)模块促进模态之间和模态之间的特征交互。最后,为了进一步减少背景的影响,我们提出了背景一致性约束(BCC)和物体中心特征细化(OCFR)。它们被表示为两个新的损失函数,通过背景抑制改善了特征识别。因此,我们的框架可以生成更有区分性的多模态物体ReID。在三个多模态ReID基准测试中进行了广泛的实验,验证了我们的方法的有效性。代码可在此处下载:https://www.example.com/。
https://arxiv.org/abs/2403.10254
Lifelong person re-identification (LReID) assumes a practical scenario where the model is sequentially trained on continuously incoming datasets while alleviating the catastrophic forgetting in the old datasets. However, not only the training datasets but also the gallery images are incrementally accumulated, that requires a huge amount of computational complexity and storage space to extract the features at the inference phase. In this paper, we address the above mentioned problem by incorporating the backward-compatibility to LReID for the first time. We train the model using the continuously incoming datasets while maintaining the model's compatibility toward the previously trained old models without re-computing the features of the old gallery images. To this end, we devise the cross-model compatibility loss based on the contrastive learning with respect to the replay features across all the old datasets. Moreover, we also develop the knowledge consolidation method based on the part classification to learn the shared representation across different datasets for the backward-compatibility. We suggest a more practical methodology for performance evaluation as well where all the gallery and query images are considered together. Experimental results demonstrate that the proposed method achieves a significantly higher performance of the backward-compatibility compared with the existing methods. It is a promising tool for more practical scenarios of LReID.
终身人物识别(LReID)假定一个实际场景,即在连续的 incoming 数据集中对模型进行序列训练,同时减轻 old 数据集中的灾难性遗忘。然而,不仅训练数据集,还包括画廊图像,都需要积累大量的计算复杂度和存储空间,在推理阶段提取特征。在本文中,我们通过首次将反向兼容性引入 LReID,解决了上述提到的这个问题。我们在连续的 incoming 数据集上训练模型,同时保持模型对之前训练的旧模型的兼容性,而不重新计算旧画廊图像的特征。为此,我们根据所有 old 数据集的对比学习,设计了一种跨模态兼容性损失。此外,我们还基于部分分类开发了知识整合方法,以学习不同数据集之间的共享表示。我们建议一种更实际的性能评估方法,其中所有画廊和查询图像都被考虑在内。实验结果表明,与现有方法相比,所提出的方法在反向兼容性方面取得了显著的提高。这是一个有前景的工具,适用于更实际的 LReID 场景。
https://arxiv.org/abs/2403.10022
Cloth-changing person re-identification aims to retrieve and identify spe-cific pedestrians by using cloth-irrelevant features in person cloth-changing scenarios. However, pedestrian images captured by surveillance probes usually contain occlusions in real-world scenarios. The perfor-mance of existing cloth-changing re-identification methods is significantly degraded due to the reduction of discriminative cloth-irrelevant features caused by occlusion. We define cloth-changing person re-identification in occlusion scenarios as occluded cloth-changing person re-identification (Occ-CC-ReID), and to the best of our knowledge, we are the first to pro-pose occluded cloth-changing person re-identification as a new task. We constructed two occluded cloth-changing person re-identification datasets for different occlusion scenarios: Occluded-PRCC and Occluded-LTCC. The datasets can be obtained from the following link: this https URL Re-Identification.
衣物换人识别旨在通过在人员更衣场景中使用与衣物无关的特征来检索和识别特定的行人。然而,由遮挡导致的现实场景中行人图像通常包含遮挡。由于遮挡导致歧视性衣物无关特征的减少,现有衣物换人识别方法的性能 significantly degraded。我们将遮挡场景中的衣物换人识别定义为遮挡衣物换人识别(Occ-CC-ReID),据我们所知,我们第一个提出了遮挡衣物换人识别作为新任务。我们为不同遮挡场景构建了两个遮挡衣物换人识别数据集:Occluded-PRCC和Occluded-LTCC。数据集可以通过以下链接获取:https://this.url。
https://arxiv.org/abs/2403.08557
Cloth-Changing Person Re-Identification (CC-ReID) aims to accurately identify the target person in more realistic surveillance scenarios, where pedestrians usually change their clothing. Despite great progress, limited cloth-changing training samples in existing CC-ReID datasets still prevent the model from adequately learning cloth-irrelevant features. In addition, due to the absence of explicit supervision to keep the model constantly focused on cloth-irrelevant areas, existing methods are still hampered by the disruption of clothing variations. To solve the above issues, we propose an Identity-aware Dual-constraint Network (IDNet) for the CC-ReID task. Specifically, to help the model extract cloth-irrelevant clues, we propose a Clothes Diversity Augmentation (CDA), which generates more realistic cloth-changing samples by enriching the clothing color while preserving the texture. In addition, a Multi-scale Constraint Block (MCB) is designed, which extracts fine-grained identity-related features and effectively transfers cloth-irrelevant knowledge. Moreover, a Counterfactual-guided Attention Module (CAM) is presented, which learns cloth-irrelevant features from channel and space dimensions and utilizes the counterfactual intervention for supervising the attention map to highlight identity-related regions. Finally, a Semantic Alignment Constraint (SAC) is designed to facilitate high-level semantic feature interaction. Comprehensive experiments on four CC-ReID datasets indicate that our method outperforms prior state-of-the-art approaches.
衣物变化人员重新识别(CC-ReID)的目的是在更真实的监视场景中准确地识别目标人物,其中行人通常会改变衣服。尽管取得了很大进展,但现有的CC-ReID数据集中有限的换衣训练样本还是无法使模型充分学习与衣物无关的特征。此外,由于缺乏明确监督来保持模型持续关注衣物无关区域,现有方法仍然受到服装变化的影响。为解决上述问题,我们提出了一个具有身份感的学习双约束网络(IDNet)用于CC-ReID任务。具体来说,为了帮助模型提取与衣物无关的线索,我们提出了一个衣物多样性增强(CDA),它通过增加衣服颜色的同时保留纹理来生成更逼真的换衣样本。此外,还设计了一个多尺度约束块(MCB),它提取细粒度的身份相关特征,并有效地将衣物无关知识转移。此外,我们还提出了一个反事实引导的注意力模块(CAM),它从通道和空间维度学习衣物无关特征,并利用反事实干预来监督注意力图以突出身份相关区域。最后,我们还设计了一个语义对齐约束(SAC)来促进高级语义特征交互。对四个CC-ReID数据集的全面实验表明,我们的方法超越了先前最先进的解决方案。
https://arxiv.org/abs/2403.08270
Due to the needs of road traffic flow monitoring and public safety management, video surveillance cameras are widely distributed in urban roads. However, the information captured directly by each camera is siloed, making it difficult to use it effectively. Vehicle re-identification refers to finding a vehicle that appears under one camera in another camera, which can correlate the information captured by multiple cameras. While license plate recognition plays an important role in some applications, there are some scenarios where re-identification method based on vehicle appearance are more suitable. The main challenge is that the data of vehicle appearance has the characteristics of high inter-class similarity and large intra-class differences. Therefore, it is difficult to accurately distinguish between different vehicles by relying only on vehicle appearance information. At this time, it is often necessary to introduce some extra information, such as spatio-temporal information. Nevertheless, the relative position of the vehicles rarely changes when passing through two adjacent cameras in the bridge scenario. In this paper, we present a vehicle re-identification method based on flock similarity, which improves the accuracy of vehicle re-identification by utilizing vehicle information adjacent to the target vehicle. When the relative position of the vehicles remains unchanged and flock size is appropriate, we obtain an average relative improvement of 204% on VeRi dataset in our experiments. Then, the effect of the magnitude of the relative position change of the vehicles as they pass through two cameras is discussed. We present two metrics that can be used to quantify the difference and establish a connection between them. Although this assumption is based on the bridge scenario, it is often true in other scenarios due to driving safety and camera location.
为了满足道路交通流量监测和公共安全管理的需要,城市道路中广泛分布着视频监控摄像头。然而,每个摄像头捕捉到的信息都是孤立的,这使得有效使用变得困难。车辆识别是指在另一个摄像头中出现的车辆,这可以关联多个摄像头捕捉到的信息。尽管车牌识别在某些应用中扮演着重要角色,但在某些场景中,基于车辆外观的识别方法可能更合适。主要挑战是,车辆外观信息的特征是高跨类相似度和大类内差异。因此,仅依靠车辆外观信息很难准确地区分不同车辆。此时,通常需要引入一些额外的信息,例如空间-时间信息。然而,在桥场景中,车辆相对位置的变化很少。在本文中,我们提出了一个基于羽量相似度的车辆识别方法,该方法利用了目标车辆周围的车辆信息来提高车辆识别的准确性。当车辆相对位置保持不变,羽量适当时,我们在实验中获得了VeRi数据集中平均相对改善率为204%的结果。接着,我们讨论了车辆在通过两个摄像头时相对位置变化的影响。我们提出了两个可以量化差异并建立联系的指标。虽然这个假设基于桥场景,但在其他场景中也是真实的,因为驾驶安全性和摄像头位置。
https://arxiv.org/abs/2403.07752
Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms on existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Further, our method shows improved performance on three PD-affected real-world applications: crowd counting, fisheye image recognition, and person re-identification. We will release source code, dataset, and models for foster further research.
视点扭曲(PD)导致图像中视觉概念的形状、大小、方向、角度和其他空间关系发生了前所未有的变化。精确估计相机内参和外参是一个具有挑战性的任务,阻碍了合成视点扭曲。缺乏专门的训练数据使开发稳健的计算机视觉方法变得非常困难。此外,扭曲纠正方法使其他计算机视觉任务变得多步级,并且缺乏性能。在本文中,我们通过在特定的一组Möbius变换上采用细粒度参数控制来减轻视点扭曲(MPD),以建模真实世界的扭曲,而无需估计相机内参和外参,也无需实际的扭曲数据。同时,我们还提出了一个专用的视点扭曲基准数据集ImageNet-PD,用于对比深度学习模型与这一新数据集的鲁棒性。与现有基准相比,所提出的方法在ImageNet-E和ImageNet-X上表现优异。此外,它在ImageNet-PD上表现出显著的改善,而始终在标准数据分布上表现稳定。进一步,我们的方法在三个PD受影响的现实应用中表现出改善的性能:人群计数、 Fish Eye图像识别和人物识别。我们将发布源代码、数据集和模型,以促进进一步的研究。
https://arxiv.org/abs/2405.02296
Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is then proposed to infer the aggregation. Furthermore, the cross-modal version of k-reciprocal re-ranking algorithm is investigated, named CMKR, in which three neighbor search strategies and one local query expansion method are explored to alleviate the modality bias problem of the near neighbors. We conduct experiments on SYSU-MM01, RegDB and LLCM datasets with our manually annotated descriptions. Both YYDS and CMKR achieve remarkable improvements over SOTA methods on all three datasets. Codes are available at this https URL.
可见-红外人员识别(VI-ReID)挑战较大,因为存在显著的跨模态差异。现有工作主要集中在通过抑制模态特定特征来学习模态无关特征,然而仅从红外样本中检索可见图像是一个极端问题,因为缺少颜色信息。为此,我们提出了Refer-VI-ReID设置,旨在将来自红外图像的目标可见图像和粗语言描述(例如"一个穿着红色上衣和黑色裤子的男人")进行匹配,以补充缺失的颜色信息。为解决此任务,我们设计了一个Y-Y-形状的分解结构,称之为YYDS,以分解和聚合目标的纹理和颜色特征。具体来说,我们首先提出了文本IoU正则化策略来促进分解训练,然后提出了联合关系模块来推断聚合。此外,我们还研究了k-互推重排算法的跨模态版本,名为CMKR,其中采用了三种邻居搜索策略和一种局部查询扩展方法来减轻近邻模态偏差问题。我们在SYSU-MM01、RegDB和LLVM数据集上进行手动注释的实验。所有设置都取得了显著的提高,超过了当前最先进的方法。代码可在此链接处获取:https://www.xxx
https://arxiv.org/abs/2403.04183
This work addresses the task of long-term person re-identification. Typically, person re-identification assumes that people do not change their clothes, which limits its applications to short-term scenarios. To overcome this limitation, we investigate long-term person re-identification, which considers both clothes-changing and clothes-consistent scenarios. In this paper, we propose a novel framework that effectively learns and utilizes both global and local information. The proposed framework consists of three streams: global, local body part, and head streams. The global and head streams encode identity-relevant information from an entire image and a cropped image of the head region, respectively. Both streams encode the most distinct, less distinct, and average features using the combinations of adversarial erasing, max pooling, and average pooling. The local body part stream extracts identity-related information for each body part, allowing it to be compared with the same body part from another image. Since body part annotations are not available in re-identification datasets, pseudo-labels are generated using clustering. These labels are then utilized to train a body part segmentation head in the local body part stream. The proposed framework is trained by backpropagating the weighted summation of the identity classification loss, the pair-based loss, and the pseudo body part segmentation loss. To demonstrate the effectiveness of the proposed method, we conducted experiments on three publicly available datasets (Celeb-reID, PRCC, and VC-Clothes). The experimental results demonstrate that the proposed method outperforms the previous state-of-the-art method.
本文解决了长期人物识别(long-term person re-identification)的任务。通常,人物识别假设人们不改变衣服,这限制了其应用于短期场景。为了克服这一限制,我们研究了长期人物识别,考虑了换衣服和换衣服一致的情况。在本文中,我们提出了一个新颖的框架,有效地学习和利用了全局和局部信息。该框架包括三个流:全局流、局部身体部分流和头流。全局和头流分别从整个图像和头部裁剪图像中编码身份相关信息。这两条流使用组合的对抗性消除、最大池化和平均池化来编码最明显的、不太明显的和平均的特征。局部身体部分流提取与每个身体部分相关的身份信息,使得它可以与另一个图像中的相同身体部分进行比较。由于身份标注在识别数据集中不存在,因此通过聚类生成伪标签。这些伪标签随后用于在局部身体部分流中训练身体部分分割头。所提出的框架通过反向传播全局身份分类损失、基于一对的损失和伪身体部分分割损失的加权求和进行训练。为了证明所提出方法的有效性,我们在三个公开可用数据集(Celeb-reID、PRCC和VC-Clothes)上进行了实验。实验结果表明,与以前的最先进方法相比,所提出的方法具有优越性能。
https://arxiv.org/abs/2403.02892
Recent unsupervised person re-identification (re-ID) methods achieve high performance by leveraging fine-grained local context. These methods are referred to as part-based methods. However, most part-based methods obtain local contexts through horizontal division, which suffer from misalignment due to various human poses. Additionally, the misalignment of semantic information in part features restricts the use of metric learning, thus affecting the effectiveness of part-based methods. The two issues mentioned above result in the under-utilization of part features in part-based methods. We introduce the Spatial Cascaded Clustering and Weighted Memory (SCWM) method to address these challenges. SCWM aims to parse and align more accurate local contexts for different human body parts while allowing the memory module to balance hard example mining and noise suppression. Specifically, we first analyze the foreground omissions and spatial confusions issues in the previous method. Then, we propose foreground and space corrections to enhance the completeness and reasonableness of the human parsing results. Next, we introduce a weighted memory and utilize two weighting strategies. These strategies address hard sample mining for global features and enhance noise resistance for part features, which enables better utilization of both global and part features. Extensive experiments on Market-1501 and MSMT17 validate the proposed method's effectiveness over many state-of-the-art methods.
最近,无监督的人重新识别(Re-ID)方法通过利用细粒度局部上下文取得了高性能。这些方法被称为基于部分的(part-based)方法。然而,大多数基于部分的方法通过水平分割获得局部上下文,这会导致因为各种人体姿势而产生的错位。此外,部分特征中的语义信息错位限制了使用指标学习,从而影响了基于部分的方法的有效性。上述两个问题导致基于部分的方法中部分特征的利用率较低。我们引入了空间级联聚类和加权记忆(SCWM)方法来解决这些问题。SCWM旨在解析和校准不同人体部位更准确的局部上下文,同时允许记忆模块平衡难样本挖掘和噪声抑制。具体来说,我们首先分析了前方法中的前景缺失和空间混淆问题。然后,我们提出了前景和空间修正来提高人类解析结果的完整性和合理性。接下来,我们引入了加权记忆,并利用了两种加权策略。这些策略解决了全局特征的难样本挖掘问题,并提高了部分特征的噪声抵抗能力,从而更好地利用全局和部分特征。在Market-1501和MSMT17等大量实验中,我们验证了所提出方法的有效性超过了许多最先进的method。
https://arxiv.org/abs/2403.00261
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match specified people in infrared images to visible images without annotation, and vice versa. USVI-ReID is a challenging yet under-explored task. Most existing methods address the USVI-ReID problem using cluster-based contrastive learning, which simply employs the cluster center as a representation of a person. However, the cluster center primarily focuses on shared information, overlooking disparity. To address the problem, we propose a Progressive Contrastive Learning with Multi-Prototype (PCLMP) method for USVI-ReID. In brief, we first generate the hard prototype by selecting the sample with the maximum distance from the cluster center. This hard prototype is used in the contrastive loss to emphasize disparity. Additionally, instead of rigidly aligning query images to a specific prototype, we generate the dynamic prototype by randomly picking samples within a cluster. This dynamic prototype is used to retain the natural variety of features while reducing instability in the simultaneous learning of both common and disparate information. Finally, we introduce a progressive learning strategy to gradually shift the model's attention towards hard samples, avoiding cluster deterioration. Extensive experiments conducted on the publicly available SYSU-MM01 and RegDB datasets validate the effectiveness of the proposed method. PCLMP outperforms the existing state-of-the-art method with an average mAP improvement of 3.9%. The source codes will be released.
无监督可见-红外人员识别(USVI-ReID)旨在将红外图像中指定的个人与可见图像中的个人进行匹配,反之亦然。USVI-ReID是一个具有挑战性但尚未被充分探索的任务。现有的方法主要通过基于聚类的对比学习来解决USVI-ReID问题,这简单地使用聚类中心作为一个人的表示。然而,聚类中心主要关注共享信息,忽视了差异。为了解决这个问题,我们提出了一个渐进式对比学习多原型(PCLMP)方法来解决USVI-ReID问题。简而言之,我们首先通过选择距离聚类中心最远的样本生成硬原型。这个硬原型用于对比损失,强调差异。此外,我们通过随机选择聚类内的样本生成动态原型。这个动态原型用于保留特征的自然多样性的同时,减少同时学习共同信息和差异信息的不稳定性。最后,我们引入了渐进学习策略,逐渐将模型的注意力从普通样本转移到困难样本,避免聚类恶化。在公开可用的大规模数据集SYSU-MM01和RegDB上进行的大量实验证实了所提出方法的有效性。PCLMP平均mAP提高了3.9%。源代码将发布。
https://arxiv.org/abs/2402.19026
Online Unsupervised Domain Adaptation (OUDA) for person Re-Identification (Re-ID) is the task of continuously adapting a model trained on a well-annotated source domain dataset to a target domain observed as a data stream. In OUDA, person Re-ID models face two main challenges: catastrophic forgetting and domain shift. In this work, we propose a new Source-guided Similarity Preservation (S2P) framework to alleviate these two problems. Our framework is based on the extraction of a support set composed of source images that maximizes the similarity with the target data. This support set is used to identify feature similarities that must be preserved during the learning process. S2P can incorporate multiple existing UDA methods to mitigate catastrophic forgetting. Our experiments show that S2P outperforms previous state-of-the-art methods on multiple real-to-real and synthetic-to-real challenging OUDA benchmarks.
在OUDA中,在线无监督领域适应(OUDA)对人物识别(Re-ID)的任务是对一个在良好注释的源域数据集上训练的模型,将其应用于观察到的目标域数据流中。在OUDA中,人物Re-ID模型面临着两个主要挑战:灾难性遗忘和领域转移。在这项工作中,我们提出了一种新的源指导相似性保留(S2P)框架来缓解这两个问题。我们的框架基于从源图像中提取一个支持集,该支持集与目标数据具有最大相似性。这个支持集用于在学习过程中保留必须保留的特征相似性。S2P可以结合多个现有的UDA方法来减轻灾难性遗忘。我们的实验结果表明,S2P在多个真实与真实和合成与真实具有挑战性的OUDA基准上超过了最先进的水平。
https://arxiv.org/abs/2402.15206
Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Training a robust LRe-ID model requires a wide range of clothing variations and expensive cloth labeling, which is lacked in current LRe-ID datasets. To address this, we perform clothing and pose transfer across identities to generate images of more clothing variations and of different persons wearing similar clothing. The augmented batch of images serve as inputs to our proposed Fine-grained Contrastive Losses, which not only supervise the Re-ID model to learn discriminative person embeddings under long-term scenarios but also ensure in-distribution data generation. Results on LRe-ID datasets demonstrate the effectiveness of our CCPA framework.
长期人物识别(LRe-ID)旨在在长时间内匹配单个个体,展示衣物、姿势和视角的差异。在这项工作中,我们提出了CCPA:对比性服装和姿势增强框架用于LRe-ID。除了外观,CCPA通过使用关系图注意力网络捕捉身体形状信息,这是 cloth-invariant 的。训练一个稳健的LRe-ID模型需要广泛的服装变化和昂贵的布料标注,这在当前的LRe-ID数据集中是缺乏的。为了解决这个问题,我们在个体之间进行服装和姿势转移,生成更多服装变化和穿着类似服装不同人物的图像。增强的批片图像作为我们提出的细粒度对比损失的输入,不仅监督重新识别模型在长期场景下学习具有区分性的个体嵌入,而且还确保同分布数据的生成。在LRe-ID数据集上的结果证明了我们的CCPA框架的有效性。
https://arxiv.org/abs/2402.14454
AI based Face Recognition Systems (FRSs) are now widely distributed and deployed as MLaaS solutions all over the world, moreso since the COVID-19 pandemic for tasks ranging from validating individuals' faces while buying SIM cards to surveillance of citizens. Extensive biases have been reported against marginalized groups in these systems and have led to highly discriminatory outcomes. The post-pandemic world has normalized wearing face masks but FRSs have not kept up with the changing times. As a result, these systems are susceptible to mask based face occlusion. In this study, we audit four commercial and nine open-source FRSs for the task of face re-identification between different varieties of masked and unmasked images across five benchmark datasets (total 14,722 images). These simulate a realistic validation/surveillance task as deployed in all major countries around the world. Three of the commercial and five of the open-source FRSs are highly inaccurate; they further perpetuate biases against non-White individuals, with the lowest accuracy being 0%. A survey for the same task with 85 human participants also results in a low accuracy of 40%. Thus a human-in-the-loop moderation in the pipeline does not alleviate the concerns, as has been frequently hypothesized in literature. Our large-scale study shows that developers, lawmakers and users of such services need to rethink the design principles behind FRSs, especially for the task of face re-identification, taking cognizance of observed biases.
基于AI的Face Recognition系统(FRS)现在在全球范围内广泛分布并作为MLaaS解决方案部署,尤其是在COVID-19大流行期间,例如验证个人购买SIM卡时的人脸识别和监视公民等任务。在这些系统中,针对边缘化群体的偏见报道较多,导致高度歧视性结果。大流行过后,世界已经正常化戴口罩,但FRS并没有跟上时代的变化。因此,这些系统容易受到口罩为基础的人脸遮挡。在本文中,我们对四个商业和九个开源FRS进行了审计,针对不同口罩和未戴口罩的图像进行人脸识别,跨越五个基准数据集(总共14,722张图片)。这些模拟了一个真实世界的验证/监视任务,类似于全球各国广泛部署的任务。三个商业和五个开源FRS高度不准确;它们进一步加剧了针对非白人居民的偏见,最低准确度仅为0%。同样,使用85名人类参与者的相同任务调查也结果不理想,准确度只有40%。因此,在流水线中的人类-闭环管理并不能减轻这些担忧,正如文献中经常假设的那样。 我们的大规模研究显示,开发人员、立法者和使用这些服务的用户需要重新思考FRS的设计原则,尤其是针对人脸识别任务,要关注所观察到的偏见。
https://arxiv.org/abs/2402.13771
Preservation of private user data is of paramount importance for high Quality of Experience (QoE) and acceptability, particularly with services treating sensitive data, such as IT-based health services. Whereas anonymization techniques were shown to be prone to data re-identification, synthetic data generation has gradually replaced anonymization since it is relatively less time and resource-consuming and more robust to data leakage. Generative Adversarial Networks (GANs) have been used for generating synthetic datasets, especially GAN frameworks adhering to the differential privacy phenomena. This research compares state-of-the-art GAN-based models for synthetic data generation to generate time-series synthetic medical records of dementia patients which can be distributed without privacy concerns. Predictive modeling, autocorrelation, and distribution analysis are used to assess the Quality of Generating (QoG) of the generated data. The privacy preservation of the respective models is assessed by applying membership inference attacks to determine potential data leakage risks. Our experiments indicate the superiority of the privacy-preserving GAN (PPGAN) model over other models regarding privacy preservation while maintaining an acceptable level of QoG. The presented results can support better data protection for medical use cases in the future.
保护用户数据的隐私对高品质体验(QoE)和可接受性至关重要,特别是对于处理敏感数据的服务的QoE。然而,显示匿名化技术易被数据重新识别,而合成数据生成技术逐渐取代了匿名化,因为它相对较短的时间和资源消耗,且对数据泄漏更健壮。生成对抗网络(GANs)被用于生成合成数据,尤其是遵循差分隐私现象的GAN框架。这项研究将最先进的基于GAN的合成数据生成模型与用于生成阿尔茨海默病患者的时序合成医疗记录的隐私保护进行了比较,这些数据在没有隐私担忧的情况下可以自由分发。预测建模、自相关和分布分析用于评估生成数据的生成质量(QoG)。通过应用成员推断攻击来确定潜在的数据泄漏风险,评估了所提出的模型的隐私保护能力。我们的实验结果表明,在保持可接受隐私水平的同时,隐私保护的GAN(PPGAN)模型比其他模型在隐私保护方面具有优势。所呈现的结果可以为未来医疗使用场景提供更好的数据保护支持。
https://arxiv.org/abs/2402.14042