Abstract
With the increasing ubiquity of cameras and smart sensors, humanity is generating data at an exponential rate. Access to this trove of information, often covering yet-underrepresented use-cases (e.g., AI in medical settings) could fuel a new generation of deep-learning tools. However, eager data scientists should first provide satisfying guarantees w.r.t. the privacy of individuals present in these untapped datasets. This is especially important for images or videos depicting faces, as their biometric information is the target of most identification methods. While a variety of solutions have been proposed to de-identify such images, they often corrupt other non-identifying facial attributes that would be relevant for downstream tasks. In this paper, we propose Disguise, a novel algorithm to seamlessly de-identify facial images while ensuring the usability of the altered data. Unlike prior arts, we ground our solution in both differential privacy and ensemble-learning research domains. Our method extracts and swaps depicted identities with fake ones, synthesized via variational mechanisms to maximize obfuscation and non-invertibility; while leveraging the supervision from a mixture-of-experts to disentangle and preserve other utility attributes. We extensively evaluate our method on multiple datasets, demonstrating higher de-identification rate and superior consistency than prior art w.r.t. various downstream tasks.
Abstract (translated)
随着相机和智能传感器的普及,人类正在以指数级速度生成数据。访问这些数据宝藏,通常涵盖尚未被充分覆盖的使用场景(例如,医疗场景中的人工智能),可以推动新一代深度学习工具的开发。然而,渴望数据科学家应该首先为这些未挖掘的数据中的个人隐私提供令人满意的保障。对于描绘面部的图像或视频,这尤为重要,因为它们的生物识别信息是大多数身份验证方法的目标。虽然已经提出了多种解决方案来解谜这些图像,但它们往往损坏了与后续任务相关的其他非识别面部属性。在本文中,我们提出了伪装算法,一种 seamlessly 解谜面部图像同时确保其可用性的新方法。与以前的艺术形式不同,我们将其解决方案建立在差异隐私和集成学习研究 domains 的双重框架内。我们的算法从描绘的身份信息中提取和交换,通过变分机制合成,以最大化混淆和非逆转性;同时利用专家混合组的监督来分离和保留其他有用属性。我们对这些数据集进行了广泛的评估,证明了相比以前的艺术形式,我们的解谜方法解谜率和一致性更高,其性能更加优越。
URL
https://arxiv.org/abs/2303.13269