Paper Reading AI Learner

PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification Method

2023-03-11 07:20:32
Zhijie Xiao, Zhicheng Dong, Hao Xiang

Abstract

In recent years, self-supervised learning has attracted widespread academic debate and addressed many of the key issues of computer vision. The present research focus is on how to construct a good agent task that allows for improved network learning of advanced semantic information on images so that model reasoning is accelerated during pre-training of the current task. In order to solve the problem that existing feature extraction networks are pre-trained on the ImageNet dataset and cannot extract the fine-grained information in pedestrian images well, and the existing pre-task of contrast self-supervised learning may destroy the original properties of pedestrian images, this paper designs a pre-task of mask reconstruction to obtain a pre-training model with strong robustness and uses it for the pedestrian re-identification task. The training optimization of the network is performed by improving the triplet loss based on the centroid, and the mask image is added as an additional sample to the loss calculation, so that the network can better cope with the pedestrian matching in practical applications after the training is completed. This method achieves about 5% higher mAP on Marker1501 and CUHK03 data than existing self-supervised learning pedestrian re-identification methods, and about 1% higher for Rank1, and ablation experiments are conducted to demonstrate the feasibility of this method. Our model code is located at this https URL.

Abstract (translated)

过去几年,自监督学习已经引起了广泛的学术争论,并解决了计算机视觉中的许多关键问题。本研究的关注点是构建一个好的代理任务,以改善图像中高级语义信息的网络学习,从而使模型推理在当前任务的前训练中加速。为了解决现有的特征提取网络在ImageNet数据集上预先训练的问题,以及现有的对比自监督学习任务可能会破坏行人图像的原始性质的问题,本文设计了 mask重建的预处理任务,以获得具有强大鲁棒性的预处理模型,并将其用于行人重识别任务。网络的训练优化是通过改进基于中心点的三元损失来实现的,并将 mask 图像作为额外的样本添加到损失计算中,从而使网络在训练完成后更好地应对行人匹配在实际应用程序中的情况。这种方法在Marker1501和CUHK03数据上实现了约5%更高的mAP,而在排名1上实现了约1%的提高,并进行了 ablation实验以证明这种方法的可行性。我们的模型代码位于这个httpsURL上。

URL

https://arxiv.org/abs/2303.06330

PDF

https://arxiv.org/pdf/2303.06330.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot