Abstract
Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is then proposed to infer the aggregation. Furthermore, the cross-modal version of k-reciprocal re-ranking algorithm is investigated, named CMKR, in which three neighbor search strategies and one local query expansion method are explored to alleviate the modality bias problem of the near neighbors. We conduct experiments on SYSU-MM01, RegDB and LLCM datasets with our manually annotated descriptions. Both YYDS and CMKR achieve remarkable improvements over SOTA methods on all three datasets. Codes are available at this https URL.
Abstract (translated)
可见-红外人员识别(VI-ReID)挑战较大,因为存在显著的跨模态差异。现有工作主要集中在通过抑制模态特定特征来学习模态无关特征,然而仅从红外样本中检索可见图像是一个极端问题,因为缺少颜色信息。为此,我们提出了Refer-VI-ReID设置,旨在将来自红外图像的目标可见图像和粗语言描述(例如"一个穿着红色上衣和黑色裤子的男人")进行匹配,以补充缺失的颜色信息。为解决此任务,我们设计了一个Y-Y-形状的分解结构,称之为YYDS,以分解和聚合目标的纹理和颜色特征。具体来说,我们首先提出了文本IoU正则化策略来促进分解训练,然后提出了联合关系模块来推断聚合。此外,我们还研究了k-互推重排算法的跨模态版本,名为CMKR,其中采用了三种邻居搜索策略和一种局部查询扩展方法来减轻近邻模态偏差问题。我们在SYSU-MM01、RegDB和LLVM数据集上进行手动注释的实验。所有设置都取得了显著的提高,超过了当前最先进的方法。代码可在此链接处获取:https://www.xxx
URL
https://arxiv.org/abs/2403.04183