Red-Teaming Segment Anything Model

Abstract
Abstract (translated)
URL
PDF

Abstract

Foundation models have emerged as pivotal tools, tackling many complex tasks through pre-training on vast datasets and subsequent fine-tuning for specific applications. The Segment Anything Model is one of the first and most well-known foundation models for computer vision segmentation tasks. This work presents a multi-faceted red-teaming analysis that tests the Segment Anything Model against challenging tasks: (1) We analyze the impact of style transfer on segmentation masks, demonstrating that applying adverse weather conditions and raindrops to dashboard images of city roads significantly distorts generated masks. (2) We focus on assessing whether the model can be used for attacks on privacy, such as recognizing celebrities' faces, and show that the model possesses some undesired knowledge in this task. (3) Finally, we check how robust the model is to adversarial attacks on segmentation masks under text prompts. We not only show the effectiveness of popular white-box attacks and resistance to black-box attacks but also introduce a novel approach - Focused Iterative Gradient Attack (FIGA) that combines white-box approaches to construct an efficient attack resulting in a smaller number of modified pixels. All of our testing methods and analyses indicate a need for enhanced safety measures in foundation models for image segmentation.

Abstract (translated)

基础模型已经成为解决计算机视觉分割任务的关键工具，通过在庞大的数据集上进行预训练，然后针对特定应用进行微调，从而解决许多复杂任务。Segment Anything Model 是第一个也是最有名的基础模型之一，用于计算机视觉分割任务。这项工作提出了一种多方面的协同分析，针对 Segment Anything Model 进行挑战性任务：（1）我们分析了风格迁移对分割掩码的影响，证明了将不利天气条件和雨滴应用于城市道路仪表板图像会显著扭曲生成的掩码。（2）我们关注评估模型是否可以用于隐私攻击，例如识别名人面容，并表明模型在这方面拥有一些不良知识。（3）最后，我们检查了模型在文本提示下对分割掩码的抗攻击性。我们不仅展示了流行的高级攻击方法和对抗黑盒攻击的有效性，而且引入了一种新方法——专注于迭代攻击（FIGA），将白盒攻击方法结合构建出高效的攻击方式，从而实现更少的修改像素数量。我们所有的测试方法和分析都表明，基础模型在图像分割方面的安全性需要得到提高。

URL

https://arxiv.org/abs/2404.02067

PDF

https://arxiv.org/pdf/2404.02067.pdf

Red-Teaming Segment Anything Model

Abstract

Abstract (translated)

URL

PDF Copy

PDF