Paper Reading AI Learner

Camera-aware Label Refinement for Unsupervised Person Re-identification

2024-03-25 06:22:27
Pengna Li, Kangyi Wu, Wenli Huang, Sanping Zhou, Jinjun Wang

Abstract

Unsupervised person re-identification aims to retrieve images of a specified person without identity labels. Many recent unsupervised Re-ID approaches adopt clustering-based methods to measure cross-camera feature similarity to roughly divide images into clusters. They ignore the feature distribution discrepancy induced by camera domain gap, resulting in the unavoidable performance degradation. Camera information is usually available, and the feature distribution in the single camera usually focuses more on the appearance of the individual and has less intra-identity variance. Inspired by the observation, we introduce a \textbf{C}amera-\textbf{A}ware \textbf{L}abel \textbf{R}efinement~(CALR) framework that reduces camera discrepancy by clustering intra-camera similarity. Specifically, we employ intra-camera training to obtain reliable local pseudo labels within each camera, and then refine global labels generated by inter-camera clustering and train the discriminative model using more reliable global pseudo labels in a self-paced manner. Meanwhile, we develop a camera-alignment module to align feature distributions under different cameras, which could help deal with the camera variance further. Extensive experiments validate the superiority of our proposed method over state-of-the-art approaches. The code is accessible at this https URL.

Abstract (translated)

无监督的人重新识别的目的是检索指定人物的图像,而无需身份标签。许多最近的无监督 Re-ID 方法采用聚类为基础的方法来测量跨相机特征的相似性,将图像大致分为簇。它们忽略了由相机领域差异引起的特征分布差异,导致性能降低。相机信息通常可用,而单个相机的特征分布通常更加关注单个个人的外观,并且具有较少的内部identity variance。受到观察的启发,我们引入了一个 Camera-Aware Label Refinement (CALR) 框架,通过聚类相机内相似性来减少相机差异。具体来说,我们使用相机内训练来获得每个相机内的可靠局部伪标签,然后通过 inter-camera 聚类生成的全局标签,以更可靠的全局伪标签的方式训练判别模型。同时,我们开发了一个相机对齐模块,用于在不同相机上对特征分布进行对齐,这可以帮助我们进一步处理相机变化。大量实验验证了我们提出的方法相对于最先进方法的优越性。代码可在此链接访问:

URL

https://arxiv.org/abs/2403.16450

PDF

https://arxiv.org/pdf/2403.16450.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot