Paper Reading AI Learner

Mining False Positive Examples for Text-Based Person Re-identification

2023-03-15 09:10:51
Wenhao Xu, Zhiyin Shao, Changxing Ding

Abstract

Text-based person re-identification (ReID) aims to identify images of the targeted person from a large-scale person image database according to a given textual description. However, due to significant inter-modal gaps, text-based person ReID remains a challenging problem. Most existing methods generally rely heavily on the similarity contributed by matched word-region pairs, while neglecting mismatched word-region pairs which may play a decisive role. Accordingly, we propose to mine false positive examples (MFPE) via a jointly optimized multi-branch architecture to handle this problem. MFPE contains three branches including a false positive mining (FPM) branch to highlight the role of mismatched word-region pairs. Besides, MFPE delicately designs a cross-relu loss to increase the gap of similarity scores between matched and mismatched word-region pairs. Extensive experiments on CUHK-PEDES demonstrate the superior effectiveness of MFPE. Our code is released at this https URL.

Abstract (translated)

基于文本的人重身份(ReID)旨在根据给定文本描述从大型人图像数据库中识别目标人物的图像。然而,由于存在显著的modal差异,基于文本的人重身份仍然是一个挑战性的问题。大多数现有方法通常 heavily rely on 匹配词框的相似性贡献,而忽视了可能扮演决定性角色的不匹配词框。因此,我们提议通过 jointly optimized 的多分支架构来处理这个问题,并开发了一种名为“False positive examples (MFPE)”的算法来 mine 误报实例(MFPE)。MFPE 包含三个分支,包括一个误报发现(FPM)分支,以突出不匹配词框的作用。此外,MFPE 精心设计了一个交叉relu损失,以增加匹配和不匹配词框之间的相似性得分之间的差距。在CUHK-PEDES 实验中,广泛的实验结果表明,MFPE 具有卓越的效果。我们的代码在此httpsURL 上发布。

URL

https://arxiv.org/abs/2303.08466

PDF

https://arxiv.org/pdf/2303.08466.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot