VertAttack: Taking advantage of Text Classifiers' horizontal vision

Abstract
Abstract (translated)
URL
PDF

Abstract

Text classification systems have continuously improved in performance over the years. However, nearly all current SOTA classifiers have a similar shortcoming, they process text in a horizontal manner. Vertically written words will not be recognized by a classifier. In contrast, humans are easily able to recognize and read words written both horizontally and vertically. Hence, a human adversary could write problematic words vertically and the meaning would still be preserved to other humans. We simulate such an attack, VertAttack. VertAttack identifies which words a classifier is reliant on and then rewrites those words vertically. We find that VertAttack is able to greatly drop the accuracy of 4 different transformer models on 5 datasets. For example, on the SST2 dataset, VertAttack is able to drop RoBERTa's accuracy from 94 to 13%. Furthermore, since VertAttack does not replace the word, meaning is easily preserved. We verify this via a human study and find that crowdworkers are able to correctly label 77% perturbed texts perturbed, compared to 81% of the original texts. We believe VertAttack offers a look into how humans might circumvent classifiers in the future and thus inspire a look into more robust algorithms.

Abstract (translated)

文本分类系统在过去几年中一直不断提高性能。然而，几乎所有当前的最优分类器都有类似的缺陷，它们以水平方式处理文本。水平书写的单词不会被分类器识别。相比之下，人类能够轻松地识别和阅读水平书写和垂直书写的单词。因此，一个的人类攻击者可以垂直书写有问题的单词，其他人类仍能够理解其含义。我们模拟了这种攻击，名为VertAttack。VertAttack会识别分类器所依赖的单词，然后将它们垂直地重写。我们发现，VertAttack能够在5个数据集上大大降低4种不同Transformer模型的准确性。例如，在SST2数据集上，VertAttack将RoBERTa的准确性从94%降低到13%。此外，由于VertAttack没有替换单词的含义，因此很容易保留。我们通过人类研究证实了这一点，并发现，在原文本上，工人能够正确地标记出77%的扰动文本，而原始文本上的81%则无法正确标记。我们相信，VertAttack提供了一个窗口，让人们思考未来人类可能会如何绕过分类器，从而激发了对更健壮算法的思考。

URL

https://arxiv.org/abs/2404.08538

PDF

https://arxiv.org/pdf/2404.08538.pdf

VertAttack: Taking advantage of Text Classifiers' horizontal vision

Abstract

Abstract (translated)

URL

PDF Copy

PDF