Paper Reading AI Learner

Modality Unified Attack for Omni-Modality Person Re-Identification

2025-01-22 09:54:43
Yuan Bian, Min Liu, Yunqi Yi, Xueping Wang, Yunfeng Ma, Yaonan Wang

Abstract

Deep learning based person re-identification (re-id) models have been widely employed in surveillance systems. Recent studies have demonstrated that black-box single-modality and cross-modality re-id models are vulnerable to adversarial examples (AEs), leaving the robustness of multi-modality re-id models unexplored. Due to the lack of knowledge about the specific type of model deployed in the target black-box surveillance system, we aim to generate modality unified AEs for omni-modality (single-, cross- and multi-modality) re-id models. Specifically, we propose a novel Modality Unified Attack method to train modality-specific adversarial generators to generate AEs that effectively attack different omni-modality models. A multi-modality model is adopted as the surrogate model, wherein the features of each modality are perturbed by metric disruption loss before fusion. To collapse the common features of omni-modality models, Cross Modality Simulated Disruption approach is introduced to mimic the cross-modality feature embeddings by intentionally feeding images to non-corresponding modality-specific subnetworks of the surrogate model. Moreover, Multi Modality Collaborative Disruption strategy is devised to facilitate the attacker to comprehensively corrupt the informative content of person images by leveraging a multi modality feature collaborative metric disruption loss. Extensive experiments show that our MUA method can effectively attack the omni-modality re-id models, achieving 55.9%, 24.4%, 49.0% and 62.7% mean mAP Drop Rate, respectively.

Abstract (translated)

基于深度学习的人再识别(re-id)模型在监控系统中得到了广泛的应用。近期的研究表明,单模态和跨模态的黑盒人再识别模型容易受到对抗样本(AEs)的影响,而多模态人再识别模型的鲁棒性尚不清楚。由于目标黑盒监视系统中所部署的具体类型未知,我们的研究旨在生成适用于全模态(单模、跨模和多模)re-id模型的统一对抗样本。 为此,我们提出了一种名为“Modality Unified Attack”(MUA)的新方法,通过训练特定于每个模态的对抗生成器来生成能够有效攻击不同全模态模型的对抗性示例。在我们的方案中,采用了一个多模态模型作为替代模型,在该模型中,每一种模态的数据特征都经过了度量破坏损失函数(metric disruption loss)的扰动处理,并在融合之前进行。 为了使各种全模态模型共有的特征失效,“Cross Modality Simulated Disruption”方法被引入,通过故意将图像输入到替代模型中的非对应特定模态子网络中来模仿跨模态特征嵌入。此外,“Multi Modality Collaborative Disruption”策略旨在帮助攻击者全面破坏人像图片中有用的信息内容,利用多模态特征协作度量破坏损失函数。 广泛的实验表明,我们的MUA方法能够有效攻击全模态的人再识别模型,在单模、跨模和多模模型上分别实现了55.9%,24.4% 和 62.7%的平均mAP下降率。此外,在包含所有三种类型的综合测试集上的均值mAP下降率为49.0%。 综上所述,我们的研究为提升人再识别模型的安全性和鲁棒性提供了一种全新的视角和方法论。

URL

https://arxiv.org/abs/2501.12761

PDF

https://arxiv.org/pdf/2501.12761.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot