Paper Reading AI Learner

RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search

2023-05-23 03:53:57
Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, Min Zhang

Abstract

Text-based person search aims to retrieve the specified person images given a textual description. The key to tackling such a challenging task is to learn powerful multi-modal representations. Towards this, we propose a Relation and Sensitivity aware representation learning method (RaSa), including two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA). For one thing, existing methods cluster representations of all positive pairs without distinction and overlook the noise problem caused by the weak positive pairs where the text and the paired image have noise correspondences, thus leading to overfitting learning. RA offsets the overfitting risk by introducing a novel positive relation detection task (i.e., learning to distinguish strong and weak positive pairs). For another thing, learning invariant representation under data augmentation (i.e., being insensitive to some transformations) is a general practice for improving representation's robustness in existing methods. Beyond that, we encourage the representation to perceive the sensitive transformation by SA (i.e., learning to detect the replaced words), thus promoting the representation's robustness. Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on CUHK-PEDES, ICFG-PEDES and RSTPReid datasets, respectively. Code is available at: this https URL.

Abstract (translated)

文本人物搜索的目标是根据文本描述检索指定人物的图像。解决这一挑战性任务的关键要学习强大的多视角表示。为此,我们提出了一种关系和敏感性 aware 表示学习方法(RaSa),包括两个全新的任务:关系 aware 学习(RA)和敏感性 aware 学习(SA)。一方面,现有的方法将所有正交对的表示Cluster在一起,并忽略文本和配对图像中弱正交对造成的噪声问题,从而导致过拟合学习。RA 减少了过拟合的风险,通过引入一个独特的正交关系检测任务(即学习区分强和弱正交对)。另一方面,学习在数据增强下的不变表示(即对一些变换变得不敏感)是改进现有方法表示鲁棒性的通用做法。此外,我们鼓励表示学习通过 SA 感知敏感变换(即学习检测替换词),从而促进表示的鲁棒性。实验表明,RaSa 在 CUHK-PEDES、ICFG-PEDES 和 RSTPReid 数据集上的排名@1分别优于现有最先进的方法的 6.94%、4.45% 和 15.35%。代码已可在上述 https URL 上提供。

URL

https://arxiv.org/abs/2305.13653

PDF

https://arxiv.org/pdf/2305.13653.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot