Abstract
Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks like generating image captions and answering visual questions across various domains. However, these capabilities are built upon training on large amount of uncurated data crawled from the web. The latter may include sensitive information that VLMs could memorize and leak, raising significant privacy concerns. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage. Our study leads to three key findings: (i) VLMs leak identity information, even when the vision-language alignment and the fine-tuning use anonymized data; (ii) context has little influence on identity leakage; (iii) simple, widely used anonymization techniques, like blurring, are not sufficient to address the problem. These findings underscore the urgent need for robust privacy protection strategies when deploying VLMs. Ethical awareness and responsible development practices are essential to mitigate these risks.
Abstract (translated)
视觉语言模型(VLMs)结合了视觉和文本理解能力,使其非常适合处理各种任务,如生成图像摘要和回答各种领域的视觉问题。然而,这些能力基于训练在大量未经许可的数据上,可能包括VLMs可以记忆和泄露的敏感信息,引发显著的隐私问题。在本文中,我们评估这些漏洞是否存在,重点关注身份泄露。我们的研究得出三个关键结论:(一)VLMs即使在使用匿名数据进行视觉-语言对齐和微调时也会泄露身份信息;(二)上下文对身份泄露的影响很小;(三)简单且广泛使用的匿名化技术,如模糊,不足以解决问题。这些发现凸显了在部署VLMs时需要采取强隐私保护策略的紧迫性。道德意识和负责任的开发实践至关重要以减轻这些风险。
URL
https://arxiv.org/abs/2408.01228