Paper Reading AI Learner

Physical Adversarial Textures that Fool Visual Object Tracking

2019-04-24 19:56:57
Rey Reza Wiyatno, Anqi Xu

Abstract

We present a system for generating inconspicuous-looking textures that, when displayed in the physical world as digital or printed posters, cause visual object tracking systems to become confused. For instance, as a target being tracked by a robot's camera moves in front of such a poster, our generated texture makes the tracker lock onto it and allows the target to evade. This work aims to fool seldom-targeted regression tasks, and in particular compares diverse optimization strategies: non-targeted, targeted, and a new family of guided adversarial losses. While we use the Expectation Over Transformation (EOT) algorithm to generate physical adversaries that fool tracking models when imaged under diverse conditions, we compare the impacts of different conditioning variables, including viewpoint, lighting, and appearances, to find practical attack setups with high resulting adversarial strength and convergence speed. We further showcase textures optimized solely using simulated scenes can confuse real-world tracking systems.

Abstract (translated)

我们提出了一种生成不显眼的纹理的系统,当它以数字或印刷海报的形式显示在物理世界中时,会导致视觉对象跟踪系统变得混乱。例如,当机器人摄像头跟踪的目标在海报前移动时,我们生成的纹理使跟踪器锁定在海报上,允许目标躲避。这项工作的目的是愚弄很少有针对性的回归任务,特别是比较不同的优化策略:无针对性的,有针对性的,以及一个新的被引导的对抗性损失家族。当我们使用期望过度转换(EOT)算法生成物理对手,在不同条件下成像时会愚弄跟踪模型,我们比较了不同条件变量的影响,包括视点、灯光和外观,以找到具有较高对抗力和协同作战能力的实际攻击设置。n收敛速度。我们进一步展示仅使用模拟场景优化的纹理,这可能会混淆真实的跟踪系统。

URL

https://arxiv.org/abs/1904.11042

PDF

https://arxiv.org/pdf/1904.11042.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot