Paper Reading AI Learner

YYDS: Visible-Infrared Person Re-Identification with Coarse Descriptions

2024-03-07 03:26:02
Yunhao Du, Zhicheng Zhao, Fei Su

Abstract

Visible-infrared person re-identification (VI-ReID) is challenging due to considerable cross-modality discrepancies. Existing works mainly focus on learning modality-invariant features while suppressing modality-specific ones. However, retrieving visible images only depends on infrared samples is an extreme problem because of the absence of color information. To this end, we present the Refer-VI-ReID settings, which aims to match target visible images from both infrared images and coarse language descriptions (e.g., "a man with red top and black pants") to complement the missing color information. To address this task, we design a Y-Y-shape decomposition structure, dubbed YYDS, to decompose and aggregate texture and color features of targets. Specifically, the text-IoU regularization strategy is firstly presented to facilitate the decomposition training, and a joint relation module is then proposed to infer the aggregation. Furthermore, the cross-modal version of k-reciprocal re-ranking algorithm is investigated, named CMKR, in which three neighbor search strategies and one local query expansion method are explored to alleviate the modality bias problem of the near neighbors. We conduct experiments on SYSU-MM01, RegDB and LLCM datasets with our manually annotated descriptions. Both YYDS and CMKR achieve remarkable improvements over SOTA methods on all three datasets. Codes are available at this https URL.

Abstract (translated)

可见-红外人员识别(VI-ReID)挑战较大,因为存在显著的跨模态差异。现有工作主要集中在通过抑制模态特定特征来学习模态无关特征,然而仅从红外样本中检索可见图像是一个极端问题,因为缺少颜色信息。为此,我们提出了Refer-VI-ReID设置,旨在将来自红外图像的目标可见图像和粗语言描述(例如"一个穿着红色上衣和黑色裤子的男人")进行匹配,以补充缺失的颜色信息。为解决此任务,我们设计了一个Y-Y-形状的分解结构,称之为YYDS,以分解和聚合目标的纹理和颜色特征。具体来说,我们首先提出了文本IoU正则化策略来促进分解训练,然后提出了联合关系模块来推断聚合。此外,我们还研究了k-互推重排算法的跨模态版本,名为CMKR,其中采用了三种邻居搜索策略和一种局部查询扩展方法来减轻近邻模态偏差问题。我们在SYSU-MM01、RegDB和LLVM数据集上进行手动注释的实验。所有设置都取得了显著的提高,超过了当前最先进的方法。代码可在此链接处获取:https://www.xxx

URL

https://arxiv.org/abs/2403.04183

PDF

https://arxiv.org/pdf/2403.04183.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot