Abstract
Neural network-based image classifiers are powerful tools for computer vision tasks, but they inadvertently reveal sensitive attribute information about their classes, raising concerns about their privacy. To investigate this privacy leakage, we introduce the first Class Attribute Inference Attack (Caia), which leverages recent advances in text-to-image synthesis to infer sensitive attributes of individual classes in a black-box setting, while remaining competitive with related white-box attacks. Our extensive experiments in the face recognition domain show that Caia can accurately infer undisclosed sensitive attributes, such as an individual's hair color, gender and racial appearance, which are not part of the training labels. Interestingly, we demonstrate that adversarial robust models are even more vulnerable to such privacy leakage than standard models, indicating that a trade-off between robustness and privacy exists.
Abstract (translated)
神经网络图像分类器是用于计算机视觉任务的强大工具,但它们无意中透露了它们所属的类别的敏感属性信息,引起了关于隐私的担忧。为了研究这种隐私泄漏,我们引入了第一个类属性推断攻击(Caia),它利用最近的文本到图像合成的进展,在黑盒环境下推断单个类别的敏感属性,但仍与相关的白盒攻击竞争。我们在人脸识别领域的大量实验表明,Caia可以准确推断未公开的敏感属性,例如个人的头发颜色、性别和种族外貌,这些属性不是训练标签的一部分。有趣的是,我们证明对抗性的稳健模型比标准模型更加容易受到这种隐私泄漏的影响,这表明稳健性和隐私之间存在权衡。
URL
https://arxiv.org/abs/2303.09289