Paper Reading AI Learner

Visible-Infrared Person Re-Identification via Patch-Mixed Cross-Modality Learning

2023-02-16 10:56:00
Zhihao Qian, Yutian Lin, Bo Du

Abstract

Visible-infrared person re-identification (VI-ReID) aims to retrieve images of the same pedestrian from different modalities, where the challenges lie in the significant modality discrepancy. To alleviate the modality gap, recent methods generate intermediate images by GANs, grayscaling, or mixup strategies. However, these methods could ntroduce extra noise, and the semantic correspondence between the two modalities is not well learned. In this paper, we propose a Patch-Mixed Cross-Modality framework (PMCM), where two images of the same person from two modalities are split into patches and stitched into a new one for model learning. In this way, the modellearns to recognize a person through patches of different styles, and the modality semantic correspondence is directly embodied. With the flexible image generation strategy, the patch-mixed images freely adjust the ratio of different modality patches, which could further alleviate the modality imbalance problem. In addition, the relationship between identity centers among modalities is explored to further reduce the modality variance, and the global-to-part constraint is introduced to regularize representation learning of part features. On two VI-ReID datasets, we report new state-of-the-art performance with the proposed method.

Abstract (translated)

visible-infrared person re-identification (VI-ReID)的目标是从不同模态中提取相同的行人图像,而模态差异的问题是挑战所在。为了缓解模态差异,最近的方法使用GAN、灰度化或混合策略生成中间图像。然而,这些方法可能会导致额外的噪声,并且两个模态之间的语义对应关系并未很好学习。在本文中,我们提出了块混合跨模态框架(PMCM),其中将同一个人的两个模态的图像分割成块并拼接成一个新的图像,以模型学习为目的。这样,模型通过学习不同模态块的不同风格来识别一个人,模态语义对应关系直接体现在图像块中。通过灵活的图像生成策略,块混合图像可以自由调整不同模态块的比例,这可以进一步减轻模态不平衡问题。此外,探索模态之间身份中心的关系,进一步减少模态差异方差,并引入全局到部分约束,以 regularize 部分特征表示学习。在两个VI-ReID数据集上,我们使用提出的方法报告了新的最高水平。

URL

https://arxiv.org/abs/2302.08212

PDF

https://arxiv.org/pdf/2302.08212.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot