Abstract
A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces the Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by finding and aligning body part features extracted from both I and V modalities. Indeed, BMDG aims to reduce the modality gaps in two steps. First, it aligns modalities in feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. In particular, our method minimizes the cross-modal gap by identifying and aligning shared prototypes that capture key discriminative features across modalities, then uses multiple bridging steps based on this information to enhance the feature representation. Experiments conducted on challenging V-I ReID datasets indicate that our BMDG approach outperforms state-of-the-art part-based models or methods that generate an intermediate domain from V-I person ReID.
Abstract (translated)
在可见-红外人员识别(V-I ReID)中的一个关键挑战是训练一个能够有效解决不同模态之间显著差异的主干模型。最先进的生成单个中间域的方法通常效果较差,因为生成的中间域可能不足以捕捉足够的共同区分信息。本文介绍了一种名为双向多级域泛化(BMDG)的新方法,用于统一不同模态的特征表示。BMDG通过找到并平滑从I和V模态中提取的身体部位特征,创建多个虚拟的中间域。实际上,BMDG旨在通过两个步骤减少模态差距。首先,它通过在特征空间中学习共享的和与模态无关的身体部位原型来对模态进行对齐。然后,它通过双向多级学习逐步优化每个步骤的特征表示,并从两个模态中包括更多的原型。特别地,我们的方法通过识别和归一化捕捉关键区分特征的共享原型,然后根据这些信息使用多个桥接步骤来增强特征表示。在具有挑战性的V-I ReID数据集上进行的实验表明,我们的BMDG方法优于最先进的部分基于模型或从V-I人员ReID中生成中间域的方法。
URL
https://arxiv.org/abs/2403.10782