Abstract
Recent years have seen increasing concerns about the private inference of NLP services and Transformer models. However, existing two-party privacy-preserving methods solely consider NLU scenarios, while the private inference of text generation such as translation, dialogue, and code completion remains unsolved. Besides, while migrated to NLG models, existing privacy-preserving methods perform poorly in terms of inference speed, and suffer from the convergence problem during the training stage. To address these issues, we propose MERGE, a fast private text generation framework for Transformer-based language models. Specifically, MERGE reuse the output hidden state as the word embedding to bypass the embedding computation, and reorganize the linear operations in the Transformer module to accelerate the forward procedure. Based on these two optimizations, extensive experiments show that MERGE can achieve a 26.5x speedup under the sequence length 512, and reduce 80\% communication bytes, with an up to 10x speedup to existing state-of-art models.
Abstract (translated)
近年来,人们对自然语言处理服务和Transformer模型的私有推理越来越关注。然而,现有的两方隐私保护方法仅仅考虑了NLU场景,而对于生成文本如翻译、对话和代码补全的私有推理仍然无法解决。此外,在迁移到NLG模型时,现有的隐私保护方法在推理速度方面表现较差,并且在训练阶段会出现收敛问题。为了解决这些问题,我们提出了Merge,一个适用于Transformer基于语言模型的快速私有文本生成框架。具体来说,Merge将输出隐状态用作单词嵌入,绕过嵌入计算,并重新安排Transformer模块中的线性操作,以加速前进过程。基于这两个优化,广泛的实验结果表明,Merge可以在序列长度为512的情况下实现26.5倍速度提升,并减少80\%的通信字节,而现有最先进的模型速度提升可以达到10倍。
URL
https://arxiv.org/abs/2305.15769