Abstract
The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: this https URL.
Abstract (translated)
近年来,对隐私保护的自然语言处理(NLP)的研究受到了越来越多的关注。一个有前景的研究方向是研究在NLP中整合差分隐私(DP),为各种应用场景带来了创新的方法。尤其值得注意的是单词级别的差分隐私(MLDP)机制,通过逐词对输入文本进行扰动来模糊可能敏感的输入文本。尽管这些方法在实证测试中显示出良好的效果,但有两个主要缺点:(1)由于加入噪声而导致的必然的效用损失,以及(2)在 high-dimensional 单词嵌入上运行这些机制的计算开销。在本文中,我们试图通过提出 1-Diffractor,一种在比较前机制速度更快但仍然具有强大的效用和隐私保护能力的新机制,来解决这些挑战。我们对 1-Diffractor 在多个 NLP 任务上的效用进行了评估,以及基于理论和任务隐私的评估,同时还评估了速度和内存方面的效率。与前 MLDP 机制相比,1-Diffractor 在效率上表现出显著的改进,同时保持了竞争的效用和隐私得分。我们的代码可在此处下载:https:// this URL。
URL
https://arxiv.org/abs/2405.01678