Paper Reading AI Learner

Adaptive Adversarial Attack on Scene Text Recognition

2018-07-09 18:12:27
Xiaoyong Yuan, Pan He, Xiaolin Andy Li

Abstract

Recent studies have shown that state-of-the-art deep learning models are vulnerable to the inputs with small perturbations (adversarial examples). We observe two critical obstacles in adversarial examples: (i) Strong adversarial attacks require manually tuning hyper-parameters, which take longer time to construct a single adversarial example, making it impractical to attack real-time systems; (ii) Most of the studies focus on non-sequential tasks, such as image classification and object detection. Only a few consider sequential tasks. Despite extensive research studies, the cause of adversarial examples remains an open problem, especially on sequential tasks. We propose an adaptive adversarial attack, called AdaptiveAttack, to speed up the process of generating adversarial examples. To validate its effectiveness, we leverage the scene text detection task as a case study of sequential adversarial examples. We further visualize the generated adversarial examples to analyze the cause of sequential adversarial examples. AdaptiveAttack achieved over 99.9\% success rate with 3-6 times speedup compared to state-of-the-art adversarial attacks.

Abstract (translated)

最近的研究表明,最先进的深度学习模型容易受到小扰动的输入(对抗性的例子)。我们在对抗性示例中观察到两个关键障碍:(i)强对抗性攻击需要手动调整超参数,这需要更长的时间来构建单个对抗性示例,使得攻击实时系统变得不切实际; (ii)大多数研究都集中在非连续任务上,例如图像分类和物体检测。只有少数人考虑顺序任务。尽管进行了广泛的研究,对抗性例子的原因仍然是一个悬而未决的问题,尤其是在顺序任务上。我们提出了一种称为AdaptiveAttack的自适应对抗攻击,以加速生成对抗性示例的过程。为了验证其有效性,我们利用场景文本检测任务作为顺序对抗示例的案例研究。我们进一步可视化生成的对抗性示例,以分析顺序对抗示例的原因。与最先进的对抗性攻击相比,AdaptiveAttack的成功率达到了99.9%以上,加速度提高了3-6倍。

URL

https://arxiv.org/abs/1807.03326

PDF

https://arxiv.org/pdf/1807.03326.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot