Paper Reading AI Learner

TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification


Abstract

Person re-identification (re-ID) via 3D skeleton data is an emerging topic with prominent advantages. Existing methods usually design skeleton descriptors with raw body joints or perform skeleton sequence representation learning. However, they typically cannot concurrently model different body-component relations, and rarely explore useful semantics from fine-grained representations of body joints. In this paper, we propose a generic Transformer-based Skeleton Graph prototype contrastive learning (TranSG) approach with structure-trajectory prompted reconstruction to fully capture skeletal relations and valuable spatial-temporal semantics from skeleton graphs for person re-ID. Specifically, we first devise the Skeleton Graph Transformer (SGT) to simultaneously learn body and motion relations within skeleton graphs, so as to aggregate key correlative node features into graph representations. Then, we propose the Graph Prototype Contrastive learning (GPC) to mine the most typical graph features (graph prototypes) of each identity, and contrast the inherent similarity between graph representations and different prototypes from both skeleton and sequence levels to learn discriminative graph representations. Last, a graph Structure-Trajectory Prompted Reconstruction (STPR) mechanism is proposed to exploit the spatial and temporal contexts of graph nodes to prompt skeleton graph reconstruction, which facilitates capturing more valuable patterns and graph semantics for person re-ID. Empirical evaluations demonstrate that TranSG significantly outperforms existing state-of-the-art methods. We further show its generality under different graph modeling, RGB-estimated skeletons, and unsupervised scenarios.

Abstract (translated)

人重新识别(re-ID)通过3D骨骼数据是一个具有显著优势的新话题。现有的方法通常使用 raw body joints 或精细的骨骼 joint 表示法来设计骨骼描述符,或进行骨骼序列表示学习。然而,它们通常不能同时模型不同身体组件的关系,而且很少从骨骼 joint 的精细表示法中探索有用的语义。在本文中,我们提出了一种通用的Transformer-based Skeleton Graph 原型对比学习(TranSG)方法,并结合结构引导重构,以完全捕获骨骼关系和从骨骼 graphs 中获取宝贵的空间-时间语义。具体来说,我们首先设计了一个Skeleton Graph Transformer(SGT),以同时学习骨骼 graphs 中的身体和运动关系,以将关键相对节点特征聚合成 graph 表示。然后,我们提出了 Graph 原型对比学习(GPC)方法,以挖掘每个身份的最重要 graph 特征(graph 原型),并对比 graph 表示和不同原型之间的固有相似性,以学习分化的 graph 表示。最后,我们提出了一种 graph 结构引导重构(STPR)机制,利用 graph 节点的空间和时间上下文,以引导骨骼 graph 重构,这有助于捕捉更有价值的模式和 graph 语义,以用于人重新识别。实验结果表明,TranSG significantly outperforms existing state-of-the-art methods。我们还在不同 graph 建模、RGB估计骨骼和无监督场景下展示了它的通用性。

URL

https://arxiv.org/abs/2303.06819

PDF

https://arxiv.org/pdf/2303.06819.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot