Abstract
Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. A public challenge to attack the models on CIFAR10 dataset remains undefeated. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We study this question on four popular text classification datasets: IMDB reviews, Yelp reviews, MR sentiment classification, and AG news classification. We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks. We also study a 01 loss sign activation convolutional neural network with a novel global pooling step specific to sign activation networks. With this new variation we see a significant gain in adversarial accuracy rendering TextFooler practically useless against it. We make our code freely available at \url{this https URL} and \url{this https URL}. Our work here suggests that 01 loss sign activation networks could be further developed to create fool proof models against text adversarial attacks.
Abstract (translated)
最近的工作表明,01损失符号激活神经网络对图像分类对抗攻击具有一定的防御能力。然而,在CIFAR10数据集上攻击这些模型 remains 未经挑战。在本研究中,我们问以下问题:01损失符号激活神经网络是否容易被名为TextFooler的流行黑盒文本对抗攻击程序欺骗?我们在四个流行的文本分类数据集上研究这个问题:IMDb评论、Yelp评论、MR情感分类和AG新闻分类。我们发现,与sigmoid激活交叉熵和二进制神经网络相比,我们的01损失符号激活网络在TextFooler上的攻击难度更大。我们还研究了一个新颖的全局池化步长的01损失符号激活卷积神经网络。通过这种新颖的变体,我们看到了显著的增加对抗准确率,使得TextFooler对它几乎没有任何用处。我们的代码目前可以从以下网址免费获取:\url{this https URL} 和 \url{this https URL}。本研究的结果表明,01损失符号激活神经网络可以进一步开发,以创建对文本对抗攻击具有充分保护的模型。
URL
https://arxiv.org/abs/2402.07347