Abstract
Text classification systems have continuously improved in performance over the years. However, nearly all current SOTA classifiers have a similar shortcoming, they process text in a horizontal manner. Vertically written words will not be recognized by a classifier. In contrast, humans are easily able to recognize and read words written both horizontally and vertically. Hence, a human adversary could write problematic words vertically and the meaning would still be preserved to other humans. We simulate such an attack, VertAttack. VertAttack identifies which words a classifier is reliant on and then rewrites those words vertically. We find that VertAttack is able to greatly drop the accuracy of 4 different transformer models on 5 datasets. For example, on the SST2 dataset, VertAttack is able to drop RoBERTa's accuracy from 94 to 13%. Furthermore, since VertAttack does not replace the word, meaning is easily preserved. We verify this via a human study and find that crowdworkers are able to correctly label 77% perturbed texts perturbed, compared to 81% of the original texts. We believe VertAttack offers a look into how humans might circumvent classifiers in the future and thus inspire a look into more robust algorithms.
Abstract (translated)
文本分类系统在过去几年中一直不断提高性能。然而,几乎所有当前的最优分类器都有类似的缺陷,它们以水平方式处理文本。水平书写的单词不会被分类器识别。相比之下,人类能够轻松地识别和阅读水平书写和垂直书写的单词。因此,一个的人类攻击者可以垂直书写有问题的单词,其他人类仍能够理解其含义。我们模拟了这种攻击,名为VertAttack。VertAttack会识别分类器所依赖的单词,然后将它们垂直地重写。我们发现,VertAttack能够在5个数据集上大大降低4种不同Transformer模型的准确性。例如,在SST2数据集上,VertAttack将RoBERTa的准确性从94%降低到13%。此外,由于VertAttack没有替换单词的含义,因此很容易保留。我们通过人类研究证实了这一点,并发现,在原文本上,工人能够正确地标记出77%的扰动文本,而原始文本上的81%则无法正确标记。我们相信,VertAttack提供了一个窗口,让人们思考未来人类可能会如何绕过分类器,从而激发了对更健壮算法的思考。
URL
https://arxiv.org/abs/2404.08538