Paper Reading AI Learner

Rethinking Clothes Changing Person ReID: Conflicts, Synthesis, and Optimization

2024-04-19 03:45:12
Junjie Li, Guanshuo Wang, Fufu Yu, Yichao Yan, Qiong Jia, Shouhong Ding, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

Abstract

Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches. In this study, we dive into the relationship between standard and clothes-changing~(CC) learning objectives, and bring the inner conflicts between these two objectives to the fore. We try to magnify the proportion of CC training pairs by supplementing high-fidelity clothes-varying synthesis, produced by our proposed Clothes-Changing Diffusion model. By incorporating the synthetic images into CC-ReID model training, we observe a significant improvement under CC protocol. However, such improvement sacrifices the performance under the standard protocol, caused by the inner conflict between standard and CC. For conflict mitigation, we decouple these objectives and re-formulate CC-ReID learning as a multi-objective optimization (MOO) problem. By effectively regularizing the gradient curvature across multiple objectives and introducing preference restrictions, our MOO solution surpasses the single-task training paradigm. Our framework is model-agnostic, and demonstrates superior performance under both CC and standard ReID protocols.

Abstract (translated)

换衣人重新识别(CC-ReID)旨在检索同一人穿着不同服装的照片。主流研究关注设计具有先进模型结构和策略,使其独立于服装捕捉身份信息。然而, previous researchers 对相同服装的歧视标准 ReID 学习目标持续忽视。在本文中,我们深入研究了标准和换衣人~(CC) 学习目标之间的关系,并揭示了这两者之间的内心冲突。通过补充我们提出的换衣人扩散模型生成的极高保真度换衣图像,我们试图通过增加 CC 训练对偶的比例如下:通过将合成图像纳入 CC-ReID 模型训练,我们观察到在 CC 协议下显著的改善。然而,这种改善牺牲了标准协议下的性能,由于标准和 CC 之间的内心冲突。为减轻这种冲突,我们解耦这两个目标,并将 CC-ReID 学习重新表述为一个多目标优化(MOO)问题。通过有效地对多个目标周围的梯度曲率进行 regularization 和引入偏好限制,我们的 MOO 解决方案超越了单任务训练范式。我们的框架对模型一无所知,并且在 CC 和标准 ReID 协议下均表现出卓越的性能。

URL

https://arxiv.org/abs/2404.12611

PDF

https://arxiv.org/pdf/2404.12611.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot