Abstract
Many existing multi-modality studies are based on the assumption of modality integrity. However, the problem of missing arbitrary modalities is very common in real life, and this problem is less studied, but actually important in the task of multi-modality person re-identification (Re-ID). To this end, we design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities, for partial multi-modality person Re-ID. To be specific, the multi-modal representation of the RGB, near-infrared (NIR) and thermal-infrared (TIR) images is learned by three branches, in which the information of missing modalities is recovered by the feature transformation module. Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner, to improve the multi-modality representation. Extensive experiments on multi-modality person Re-ID dataset RGBNT201 and vehicle Re-ID dataset RGBNT100 comparing to the state-of-the-art methods verify the effectiveness of our method in complex and changeable environments.
Abstract (translated)
许多现有的多模态研究都基于模态完整性假设。然而,在生活中,出现任意模态缺失的问题很常见,这个问题在多模态人物重识别任务(Re-ID)中并不是很少研究,但确实很重要。为此,我们设计了一种全新的动态增强网络(DENet),它可以允许任意模态缺失,同时保持多种模态的表示能力,用于部分多模态人物重识别。具体来说,我们学习了RGB、近红外(NIR)和热红外(TIR)图像的多模态表示,其中缺失的模态信息通过特征转换模块恢复。由于缺失状态可能可以改变,我们设计了动态增强模块,它根据缺失状态自适应地动态增强多种模态特征,以改善多模态表示。我们对多模态人物重识别数据集RGBNT201和车辆重识别数据集RGBNT100与最先进的方法进行了广泛的比较,以验证我们方法在复杂和可改变的环境中的有效性。
URL
https://arxiv.org/abs/2305.15762