Feature Completion Transformer for Occluded Person Re-identification

Abstract
Abstract (translated)
URL
PDF

Abstract

Occluded person re-identification (Re-ID) is a challenging problem due to the destruction of occluders. Most existing methods focus on visible human body parts through some prior information. However, when complementary occlusions occur, features in occluded regions can interfere with matching, which affects performance severely. In this paper, different from most previous works that discard the occluded region, we propose a Feature Completion Transformer (FCFormer) to implicitly complement the semantic information of occluded parts in the feature space. Specifically, Occlusion Instance Augmentation (OIA) is proposed to simulates real and diverse occlusion situations on the holistic image. These augmented images not only enrich the amount of occlusion samples in the training set, but also form pairs with the holistic images. Subsequently, a dual-stream architecture with a shared encoder is proposed to learn paired discriminative features from pairs of inputs. Without additional semantic information, an occluded-holistic feature sample-label pair can be automatically created. Then, Feature Completion Decoder (FCD) is designed to complement the features of occluded regions by using learnable tokens to aggregate possible information from self-generated occluded features. Finally, we propose the Cross Hard Triplet (CHT) loss to further bridge the gap between complementing features and extracting features under the same ID. In addition, Feature Completion Consistency (FC$^2$) loss is introduced to help the generated completion feature distribution to be closer to the real holistic feature distribution. Extensive experiments over five challenging datasets demonstrate that the proposed FCFormer achieves superior performance and outperforms the state-of-the-art methods by significant margins on occluded datasets.

Abstract (translated)

遮罩人重定向(Re-ID)是一个由于遮罩破坏而带来的挑战性问题。大部分现有方法都通过某些先验信息专注于可见人体部位。然而,当互补遮罩发生时,遮罩区域中的特征是可能与匹配干扰的,这严重影响了性能。在本文中,与大多数先前工作放弃遮罩区域不同,我们提出了一个特征完成Transformer(FC Former),以在特征空间中隐含地补充遮罩部分语义信息。具体来说,我们提出了遮罩实例增强(OIA),以模拟整个图像中的实际和多样化的遮罩情况。这些增强图像不仅丰富了训练集中的遮罩样本数量,而且与整个图像形成了对对。随后,我们提出了一种具有共享编码器的双重流架构,从两个输入中学习对偶的特征。在没有额外的语义信息的情况下,可以自动创建遮罩-整体特征样本标签对。然后,我们提出了特征完成解码器(FCD),以通过可学习代币将自生成遮罩特征中的可能信息聚合起来,以补充遮罩区域的特征。最后,我们提出了交叉硬二元分类(CHT)损失,以进一步弥合重定向特征和提取特征的ID相同的特征提取特征之间的差距。此外,我们引入了特征完成一致性(FC$^2$)损失,以帮助生成的完成特征分布更接近真实的整体特征分布。广泛的实验在五个挑战性数据集上证明了,我们提出的FC Former取得了更好的性能,并在遮罩数据集上比最先进的方法领先显著。

URL

https://arxiv.org/abs/2303.01656

PDF

https://arxiv.org/pdf/2303.01656.pdf