Abstract
Large Language Models (LLMs) are trained on large-scale web data, which makes it difficult to grasp the contribution of each text. This poses the risk of leaking inappropriate data such as benchmarks, personal information, and copyrighted texts in the training data. Membership Inference Attacks (MIA), which determine whether a given text is included in the model's training data, have been attracting attention. Previous studies of MIAs revealed that likelihood-based classification is effective for detecting leaks in LLMs. However, the existing methods cannot be applied to some proprietary models like ChatGPT or Claude 3 because the likelihood is unavailable to the user. In this study, we propose a Sampling-based Pseudo-Likelihood (\textbf{SPL}) method for MIA (\textbf{SaMIA}) that calculates SPL using only the text generated by an LLM to detect leaks. The SaMIA treats the target text as the reference text and multiple outputs from the LLM as text samples, calculates the degree of $n$-gram match as SPL, and determines the membership of the text in the training data. Even without likelihoods, SaMIA performed on par with existing likelihood-based methods.
Abstract (translated)
大规模语言模型(LLMs)在大型网络数据集上进行训练,这使得难以理解每个文本的贡献。这可能导致训练数据中泄露不适当的数据,例如基准数据、个人信息和受版权保护的文本。元推理攻击(MIA)已引起关注。以前的研究表明,可能性基础分类对于检测LLM中的泄漏很有成效。然而,对于一些专有模型如ChatGPT或Claude 3,由于无法访问可能性,现有方法无法应用。在本文中,我们提出了一种基于采样的伪似然度(SPL)方法(SaMIA)用于元推理攻击(SaMIA),该方法仅基于LLM生成的文本计算SPL,将目标文本视为参考文本,将LLM的多个输出视为文本样本,计算n-gram匹配的度量,并确定文本在训练数据中的成员资格。即使没有概率,SaMIA的表现与现有概率方法相当。
URL
https://arxiv.org/abs/2404.11262