Paper Reading AI Learner

View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network

2024-03-21 16:08:21
Quan Zhang, Lei Wang, Vishal M. Patel, Xiaohua Xie, Jianhuang Lai


Existing person re-identification methods have achieved remarkable advances in appearance-based identity association across homogeneous cameras, such as ground-ground matching. However, as a more practical scenario, aerial-ground person re-identification (AGPReID) among heterogeneous cameras has received minimal attention. To alleviate the disruption of discriminative identity representation by dramatic view discrepancy as the most significant challenge in AGPReID, the view-decoupled transformer (VDT) is proposed as a simple yet effective framework. Two major components are designed in VDT to decouple view-related and view-unrelated features, namely hierarchical subtractive separation and orthogonal loss, where the former separates these two features inside the VDT, and the latter constrains these two to be independent. In addition, we contribute a large-scale AGPReID dataset called CARGO, consisting of five/eight aerial/ground cameras, 5,000 identities, and 108,563 images. Experiments on two datasets show that VDT is a feasible and effective solution for AGPReID, surpassing the previous method on mAP/Rank1 by up to 5.0%/2.7% on CARGO and 3.7%/5.2% on AG-ReID, keeping the same magnitude of computational complexity. Our project is available at this https URL

Abstract (translated)

目前,在基于外观的个体识别方法已经在均匀相机中取得了显著的进步,例如地面地面匹配。然而,作为更实际的场景,异质相机中的航空地面人物识别(AGPReID)受到了很少的关注。为了减轻由于显著的视差差异导致的区分性身份表示中断,我们提出了一个简单的但有效的框架——视解耦变压器(VDT)。 VDT有两个主要组成部分,用于解耦视相关和视无关特征。具体来说,前者在VDT内部分离这两个特征,后者则约束这两个特征相互独立。此外,我们还提出了一个名为CARGO的大规模AGPReID数据集,包括5/8个航空/地面相机,5,000个个体和108,563个图像。在两个数据集上的实验结果表明,VDT对于AGPReID是一个可行的且有效的解决方案,在CARGO数据集上比前方法提高了5.0%/2.7%的mAP/Rank1,而在AG-ReID数据集上提高了3.7%/5.2%的性能,同时保持相同的计算复杂度。我们的项目可以在这个https://url上找到。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot