Abstract
Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.
Abstract (translated)
神经机器翻译(NMT)是一种广泛受欢迎的文本生成任务,然而,在开发保护隐私的NMT模型的过程中,研究空白相当大。尽管对于NMT系统,数据隐私问题相当严重,但不同的软件库使用的实现细节并不总是明确的,代码库也不一定公开,导致了可重复性问题。为了解决这个问题,我们引入了DP-NMT,一个开源框架,用于研究用DP-SGD保护隐私的NMT,将各种模型、数据集和评估指标整合在一个系统中。我们的目标是为研究人员提供一个平台,以推动隐私保护NMT系统的开发,保持DP-SGD算法的具体细节公开和易用。我们在通用和隐私相关数据集上进行了一系列实验,以展示我们框架的使用。我们将我们的框架公开发布,并欢迎来自社区的反馈。
URL
https://arxiv.org/abs/2311.14465