Paper Reading AI Learner

Feature Completion Transformer for Occluded Person Re-identification

2023-03-03 01:12:57
Tao Wang, Hong Liu, Wenhao Li, Miaoju Ban, Tuanyu Guo, Yidi Li

Abstract

Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders. Most existing methods focus on visible human body parts through some prior information. However, when complementary occlusions occur, features in occluded regions can interfere with matching, which affects performance severely. In this paper, different from most previous works that discard the occluded region, we propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space. Specifically, Occlusion Instance Augmentation (OIA) is proposed to simulates real and diverse occlusion situations on the holistic image. These augmented images not only enrich the amount of occlusion samples in the training set, but also form pairs with the holistic images. Subsequently, a dual-stream architecture with a shared encoder is proposed to learn paired discriminative features from pairs of inputs. Without additional semantic information, an occluded-holistic feature sample-label pair can be automatically created. Then, Feature Completion Decoder (FCD) is designed to complement the features of occluded regions by using learnable tokens to aggregate possible information from self-generated occluded features. Finally, we propose the Cross Hard Triplet (CHT) loss to further bridge the gap between complementing features and extracting features under the same ID. In addition, Feature Completion Consistency (FC$^2$) loss is introduced to help the generated completion feature distribution to be closer to the real holistic feature distribution. Extensive experiments over five challenging datasets demonstrate that the proposed FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.

Abstract (translated)

遮罩人重定向(Re-ID)是一个由于遮罩破坏而带来的挑战性问题。大部分现有方法都通过某些先验信息专注于可见人体部位。然而,当互补遮罩发生时,遮罩区域中的特征是可能与匹配干扰的,这严重影响了性能。在本文中,与大多数先前工作放弃遮罩区域不同,我们提出了一个特征完成Transformer(FC Former),以在特征空间中隐含地补充遮罩部分语义信息。具体来说,我们提出了遮罩实例增强(OIA),以模拟整个图像中的实际和多样化的遮罩情况。这些增强图像不仅丰富了训练集中的遮罩样本数量,而且与整个图像形成了对对。随后,我们提出了一种具有共享编码器的双重流架构,从两个输入中学习对偶的特征。在没有额外的语义信息的情况下,可以自动创建遮罩-整体特征样本标签对。然后,我们提出了特征完成解码器(FCD),以通过可学习代币将自生成遮罩特征中的可能信息聚合起来,以补充遮罩区域的特征。最后,我们提出了交叉硬二元分类(CHT)损失,以进一步弥合重定向特征和提取特征的ID相同的特征提取特征之间的差距。此外,我们引入了特征完成一致性(FC$^2$)损失,以帮助生成的完成特征分布更接近真实的整体特征分布。广泛的实验在五个挑战性数据集上证明了,我们提出的FC Former取得了更好的性能,并在遮罩数据集上比最先进的方法领先显著。

URL

https://arxiv.org/abs/2303.01656

PDF

https://arxiv.org/pdf/2303.01656.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot