Abstract
Data privacy is an important issue for "machine learning as a service" providers. We focus on the problem of membership inference attacks: given a data sample and black-box access to a model's API, determine whether the sample existed in the model's training data. Our contribution is an investigation of this problem in the context of sequence-to-sequence models, which are important in applications such as machine translation and video captioning. We define the membership inference problem for sequence generation, provide an open dataset based on state-of-the-art machine translation models, and report initial results on whether these models leak private information against several kinds of membership inference attacks.
Abstract (translated)
数据隐私是“机器学习即服务”提供商面临的一个重要问题。我们关注成员关系推理攻击的问题:给定一个数据样本和黑盒访问模型的API,确定该样本是否存在于模型的训练数据中。我们的贡献是在序列对序列模型的背景下对这个问题进行研究,这在机器翻译和视频字幕等应用中具有重要意义。我们定义了用于序列生成的成员关系推理问题,提供了一个基于最先进机器翻译模型的开放数据集,并报告了这些模型是否泄漏针对几种成员关系推理攻击的私有信息的初步结果。
URL
https://arxiv.org/abs/1904.05506