HydraText: Multi-objective Optimization for Adversarial Textual Attack

2021-11-02 12:10:58

Shengcai Liu, Ning Lu, Cheng Chen, Chao Qian, Ke Tang

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

The field of adversarial textual attack has significantly grown over the last years, where the commonly considered objective is to craft adversarial examples that can successfully fool the target models. However, the imperceptibility of attacks, which is also an essential objective, is often left out by previous studies. In this work, we advocate considering both objectives at the same time, and propose a novel multi-optimization approach (dubbed HydraText) with provable performance guarantee to achieve successful attacks with high imperceptibility. We demonstrate the efficacy of HydraText through extensive experiments under both score-based and decision-based settings, involving five modern NLP models across five benchmark datasets. In comparison to existing state-of-the-art attacks, HydraText consistently achieves simultaneously higher success rates, lower modification rates, and higher semantic similarity to the original texts. A human evaluation study shows that the adversarial examples crafted by HydraText maintain validity and naturality well. Finally, these examples also exhibit good transferability and can bring notable robustness improvement to the target models by adversarial training.

Abstract (translated)

URL

https://arxiv.org/abs/2111.01528

PDF

https://arxiv.org/pdf/2111.01528.pdf