Paper Reading AI Learner

CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose Augmentation

2024-02-22 11:16:34
Vuong D. Nguyen, Shishir K. Shah

Abstract

Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Training a robust LRe-ID model requires a wide range of clothing variations and expensive cloth labeling, which is lacked in current LRe-ID datasets. To address this, we perform clothing and pose transfer across identities to generate images of more clothing variations and of different persons wearing similar clothing. The augmented batch of images serve as inputs to our proposed Fine-grained Contrastive Losses, which not only supervise the Re-ID model to learn discriminative person embeddings under long-term scenarios but also ensure in-distribution data generation. Results on LRe-ID datasets demonstrate the effectiveness of our CCPA framework.

Abstract (translated)

长期人物识别(LRe-ID)旨在在长时间内匹配单个个体,展示衣物、姿势和视角的差异。在这项工作中,我们提出了CCPA:对比性服装和姿势增强框架用于LRe-ID。除了外观,CCPA通过使用关系图注意力网络捕捉身体形状信息,这是 cloth-invariant 的。训练一个稳健的LRe-ID模型需要广泛的服装变化和昂贵的布料标注,这在当前的LRe-ID数据集中是缺乏的。为了解决这个问题,我们在个体之间进行服装和姿势转移,生成更多服装变化和穿着类似服装不同人物的图像。增强的批片图像作为我们提出的细粒度对比损失的输入,不仅监督重新识别模型在长期场景下学习具有区分性的个体嵌入,而且还确保同分布数据的生成。在LRe-ID数据集上的结果证明了我们的CCPA框架的有效性。

URL

https://arxiv.org/abs/2402.14454

PDF

https://arxiv.org/pdf/2402.14454.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot