Paper Reading AI Learner

Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification

2024-03-18 12:12:45
Kaijie Ren, Lei Zhang

Abstract

Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal pedestrian retrieval task, due to significant intra-class variations and cross-modal discrepancies among different cameras. Existing works mainly focus on embedding images of different modalities into a unified space to mine modality-shared features. They only seek distinctive information within these shared features, while ignoring the identity-aware useful information that is implicit in the modality-specific features. To address this issue, we propose a novel Implicit Discriminative Knowledge Learning (IDKL) network to uncover and leverage the implicit discriminative information contained within the modality-specific. First, we extract modality-specific and modality-shared features using a novel dual-stream network. Then, the modality-specific features undergo purification to reduce their modality style discrepancies while preserving identity-aware discriminative knowledge. Subsequently, this kind of implicit knowledge is distilled into the modality-shared feature to enhance its distinctiveness. Finally, an alignment loss is proposed to minimize modality discrepancy on enhanced modality-shared features. Extensive experiments on multiple public datasets demonstrate the superiority of IDKL network over the state-of-the-art methods. Code is available at this https URL.

Abstract (translated)

可见-红外人物识别(VI-ReID)是一个具有挑战性的跨模态行人检索任务,因为不同相机之间存在显著的类内差异和跨模态差异。现有工作主要集中在将不同模态的图像嵌入到一个统一的 space 中,以挖掘模态共性特征。他们仅关注这些共享特征中的显着信息,而忽略了隐含在模态特定特征中的身份意识有用信息。为了解决这个问题,我们提出了一个新颖的隐式区分性知识学习(IDKL)网络来揭示和利用模态特定特征中隐含的区分性信息。首先,我们使用一种新颖的双流网络提取模态特定和模态共性特征。然后,模态特定特征经过净化,以减少其模态风格差异,同时保留身份意识区分性知识。接下来,这种隐含知识被蒸馏到模态共性特征中,以增强其独特性。最后,提出了一种对增强模态共性特征的同步损失,以最小化模态差异。在多个公开数据集上进行的大量实验证明,IDKL网络相对于最先进的方法具有优越性。代码可在此链接处获取。

URL

https://arxiv.org/abs/2403.11708

PDF

https://arxiv.org/pdf/2403.11708.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot