Abstract
Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims to learn modality-invariant features from unlabeled cross-modality datasets and reduce the inter-modality gap. However, the existing methods lack cross-modality clustering or excessively pursue cluster-level association, which makes it difficult to perform reliable modality-invariant features learning. To deal with this issue, we propose a Extended Cross-Modality United Learning (ECUL) framework, incorporating Extended Modality-Camera Clustering (EMCC) and Two-Step Memory Updating Strategy (TSMem) modules. Specifically, we design ECUL to naturally integrates intra-modality clustering, inter-modality clustering and inter-modality instance selection, establishing compact and accurate cross-modality associations while reducing the introduction of noisy labels. Moreover, EMCC captures and filters the neighborhood relationships by extending the encoding vector, which further promotes the learning of modality-invariant and camera-invariant knowledge in terms of clustering algorithm. Finally, TSMem provides accurate and generalized proxy points for contrastive learning by updating the memory in stages. Extensive experiments results on SYSU-MM01 and RegDB datasets demonstrate that the proposed ECUL shows promising performance and even outperforms certain supervised methods.
Abstract (translated)
无监督可见光-红外行人重识别(USL-VI-ReID)的目标是从未标注的跨模态数据集中学习模态不变特征,以减少跨模态差异。然而,现有的方法要么缺乏跨模态聚类能力,要么过度追求簇级别关联,这使得可靠地学习模态不变特征变得困难。为了解决这些问题,我们提出了一种扩展跨模态联合学习(ECUL)框架,该框架结合了扩展的模态-相机聚类(EMCC)和两步内存更新策略(TSMem)模块。 具体而言,我们设计的ECUL框架自然地整合了内模态聚类、跨模态聚类以及实例级别的跨模态选择机制,在减少引入噪音标签的同时建立了紧密且准确的跨模态关联。此外,EMCC通过扩展编码向量来捕捉和过滤邻域关系,进一步促进基于聚类算法的模态不变性和相机不变性知识的学习。最后,TSMem通过分阶段更新内存提供精确而通用的对比学习代理点。 在SYSU-MM01和RegDB数据集上进行的一系列实验结果表明,所提出的ECUL框架展示了令人鼓舞的表现,并且甚至超过了某些监督方法的效果。
URL
https://arxiv.org/abs/2412.19134