Paper Reading AI Learner

Interpretable Adversarial Training for Text

2019-05-30 05:55:58
Samuel Barham, Soheil Feizi

Abstract

Generating high-quality and interpretable adversarial examples in the text domain is a much more daunting task than it is in the image domain. This is due partly to the discrete nature of text, partly to the problem of ensuring that the adversarial examples are still probable and interpretable, and partly to the problem of maintaining label invariance under input perturbations. In order to address some of these challenges, we introduce sparse projected gradient descent (SPGD), a new approach to crafting interpretable adversarial examples for text. SPGD imposes a directional regularization constraint on input perturbations by projecting them onto the directions to nearby word embeddings with highest cosine similarities. This constraint ensures that perturbations move each word embedding in an interpretable direction (i.e., towards another nearby word embedding). Moreover, SPGD imposes a sparsity constraint on perturbations at the sentence level by ignoring word-embedding perturbations whose norms are below a certain threshold. This constraint ensures that our method changes only a few words per sequence, leading to higher quality adversarial examples. Our experiments with the IMDB movie review dataset show that the proposed SPGD method improves adversarial example interpretability and likelihood (evaluated by average per-word perplexity) compared to state-of-the-art methods, while suffering little to no loss in training performance.

Abstract (translated)

在文本域中生成高质量和可解释的对抗性示例比在图像域中更为艰巨。这部分是由于文本的离散性,部分是由于确保对抗性示例仍然是可能的和可解释的问题,部分是由于在输入扰动下保持标签不变性的问题。为了解决这些挑战,我们引入了稀疏投影梯度下降(spgd),这是一种为文本制作可解释的对抗性示例的新方法。SPGD通过将输入扰动投影到附近具有最高余弦相似性的单词嵌入的方向上,对输入扰动施加方向正则化约束。这种约束确保了干扰将每个嵌入的单词移动到一个可解释的方向(即,向另一个邻近的嵌入单词移动)。此外,SPGD通过忽略规范低于某一阈值的嵌入词干扰,对句子级的干扰施加稀疏约束。这个约束确保我们的方法在每个序列中只更改几个词,从而产生更高质量的对抗性示例。我们对IMDB电影评论数据集的实验表明,与最先进的方法相比,所提出的SPGD方法提高了对抗性示例的可解释性和可能性(通过平均每个单词的困惑程度进行评估),同时在训练性能上几乎没有损失。

URL

https://arxiv.org/abs/1905.12864

PDF

https://arxiv.org/pdf/1905.12864.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot