DP-NMT: Scalable Differentially-Private Machine Translation

Abstract
Abstract (translated)
URL
PDF

Abstract

Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.

Abstract (translated)

神经机器翻译（NMT）是一种广泛受欢迎的文本生成任务，然而，在开发保护隐私的NMT模型的过程中，研究空白相当大。尽管对于NMT系统，数据隐私问题相当严重，但不同的软件库使用的实现细节并不总是明确的，代码库也不一定公开，导致了可重复性问题。为了解决这个问题，我们引入了DP-NMT，一个开源框架，用于研究用DP-SGD保护隐私的NMT，将各种模型、数据集和评估指标整合在一个系统中。我们的目标是为研究人员提供一个平台，以推动隐私保护NMT系统的开发，保持DP-SGD算法的具体细节公开和易用。我们在通用和隐私相关数据集上进行了一系列实验，以展示我们框架的使用。我们将我们的框架公开发布，并欢迎来自社区的反馈。

URL

https://arxiv.org/abs/2311.14465

PDF

https://arxiv.org/pdf/2311.14465.pdf

DP-NMT: Scalable Differentially-Private Machine Translation

Abstract

Abstract (translated)

URL

PDF Copy

PDF