Abstract
Robustness certification, which aims to formally certify the predictions of neural networks against adversarial inputs, has become an integral part of important tool for safety-critical applications. Despite considerable progress, existing certification methods are limited to elementary architectures, such as convolutional networks, recurrent networks and recently Transformers, on benchmark datasets such as MNIST. In this paper, we focus on the robustness certification of scene text recognition (STR), which is a complex and extensively deployed image-based sequence prediction problem. We tackle three types of STR model architectures, including the standard STR pipelines and the Vision Transformer. We propose STR-Cert, the first certification method for STR models, by significantly extending the DeepPoly polyhedral verification framework via deriving novel polyhedral bounds and algorithms for key STR model components. Finally, we certify and compare STR models on six datasets, demonstrating the efficiency and scalability of robustness certification, particularly for the Vision Transformer.
Abstract (translated)
安全性认证(旨在正式认证神经网络对抗性输入的预测)已成为安全关键应用的重要工具。尽管取得了显著的进展,但现有的认证方法仅限于基本的架构,如卷积神经网络、循环神经网络和最近 Transformer,在基准数据集如 MNIST 上。在本文中,我们重点关注场景文本识别(STR)模型的安全性认证,这是一种复杂且广泛部署的图像序列预测问题。我们解决了三种 STR 模型架构,包括标准的 STR 管道和 Vision Transformer。我们通过通过扩展 DeepP 凸多面体验证框架来引入新的凸多面体界来提出 STR-Cert,这是第一个为 STR 模型设计的认证方法。最后,我们在六个数据集上进行了 STR 模型的认证和比较,证明了安全性认证的有效性和可扩展性,特别是对于 Vision Transformer。
URL
https://arxiv.org/abs/2401.05338