Paper Reading AI Learner

How do humans perceive adversarial text? A reality check on the validity and naturalness of word-based adversarial attacks

2023-05-24 21:52:13
Salijona Dyrmishi, Salah Ghamizi, Maxime Cordy

Abstract

Natural Language Processing (NLP) models based on Machine Learning (ML) are susceptible to adversarial attacks -- malicious algorithms that imperceptibly modify input text to force models into making incorrect predictions. However, evaluations of these attacks ignore the property of imperceptibility or study it under limited settings. This entails that adversarial perturbations would not pass any human quality gate and do not represent real threats to human-checked NLP systems. To bypass this limitation and enable proper assessment (and later, improvement) of NLP model robustness, we have surveyed 378 human participants about the perceptibility of text adversarial examples produced by state-of-the-art methods. Our results underline that existing text attacks are impractical in real-world scenarios where humans are involved. This contrasts with previous smaller-scale human studies, which reported overly optimistic conclusions regarding attack success. Through our work, we hope to position human perceptibility as a first-class success criterion for text attacks, and provide guidance for research to build effective attack algorithms and, in turn, design appropriate defence mechanisms.

Abstract (translated)

基于机器学习(ML)的自然语言处理模型(NLP模型)容易被dversarial攻击所攻击,这些攻击者通过微小的变化来悄悄地修改输入文本,从而迫使模型做出错误的预测。然而,对这些攻击的评价却忽视了 imperceptibility 的特性,或者只在特定条件下进行了研究。这导致dversarial perturbations 无法通过任何人类质量门(human quality gate)通过,并且并不能对人工检查的NLP系统构成真正的威胁。为了绕过这个限制并正确评估(后来改进)NLP模型的鲁棒性,我们对378名人类参与者调查了最先进的方法所生成的文本dversarial examples 的可感知性。我们的结果显示,在人类参与的真实场景下,现有的文本攻击是不切实际的。这与之前较小的人类研究相比,它们 reporting 过份乐观的攻击成功结论。通过我们的工作,我们希望将人类感知作为文本攻击的成功标准之一,并为全球研究人员提供指导,以构建有效的攻击算法,并相应地设计适当的防御机制。

URL

https://arxiv.org/abs/2305.15587

PDF

https://arxiv.org/pdf/2305.15587.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot