Paper Reading AI Learner

BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

2024-04-01 05:01:52
Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

Abstract

Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in the black-box scenario. In this work, we introduce the first unified black-box adversarial patch attack framework against pixel-wise regression tasks, aiming to identify the vulnerabilities of these models under query-based black-box attacks. We propose a novel square-based adversarial patch optimization framework and employ probabilistic square sampling and score-based gradient estimation techniques to generate the patch effectively and efficiently, overcoming the scalability problem of previous black-box patch attacks. Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models. BadPart surpasses 3 baseline methods in terms of both attack performance and efficiency. We also apply BadPart on the Google online service for portrait depth estimation, causing 43.5% relative distance error with 50K queries. State-of-the-art (SOTA) countermeasures cannot defend our attack effectively.

Abstract (translated)

像素级回归任务(例如,单目深度估计(MDE)和光学流估计(OFE))在日常生活中广泛应用于自动驾驶、增强现实和视频编辑等应用中。虽然某些应用是安全关键或具有社会意义,但这类模型的对抗性鲁棒性尚未得到充分研究,尤其是在黑盒场景中。在本文中,我们提出了第一个针对像素级回归任务的统一黑盒攻击补丁攻击框架,旨在识别这些模型在基于查询的黑盒攻击下的漏洞。我们提出了一个新颖的平方基攻击补丁优化框架,并采用概率平方抽样和基于分数的梯度估计技术来生成补丁,有效克服了以前黑盒补丁攻击的规模问题。我们的攻击原型名为BadPart,在MDE和OFE任务上进行评估,使用了7个模型。BadPart在攻击效果和效率方面超过了3个基线方法。我们还将在Google在线服务上应用BadPart进行肖像深度估计,导致50K个查询的相对距离误差为43.5%。目前最先进的防御措施无法有效防御我们的攻击。

URL

https://arxiv.org/abs/2404.00924

PDF

https://arxiv.org/pdf/2404.00924.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot