A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish

Abstract
Abstract (translated)
URL
PDF

Abstract

Large Language Models (LLMs) are trained on massive web-crawled corpora. This poses risks of leakage, including personal information, copyrighted texts, and benchmark datasets. Such leakage leads to undermining human trust in AI due to potential unauthorized generation of content or overestimation of performance. We establish the following three criteria concerning the leakage issues: (1) leakage rate: the proportion of leaked data in training data, (2) output rate: the ease of generating leaked data, and (3) detection rate: the detection performance of leaked versus non-leaked data. Despite the leakage rate being the origin of data leakage issues, it is not understood how it affects the output rate and detection rate. In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data. Additionally, we propose a self-detection approach that uses few-shot learning in which LLMs detect whether instances are present or absent in their training data, in contrast to previous methods that do not employ explicit learning. To explore the ease of generating leaked information, we create a dataset of prompts designed to elicit personal information, copyrighted text, and benchmarks from LLMs. Our experiments reveal that LLMs produce leaked information in most cases despite less such data in their training set. This indicates even small amounts of leaked data can greatly affect outputs. Our self-detection method showed superior performance compared to existing detection methods.

Abstract (translated)

大语言模型（LLMs）通过训练于大规模爬取的语料库进行训练。这可能导致泄露风险，包括个人隐私信息、受版权保护的文本和基准数据。这种泄露导致由于可能未经授权生成内容或高估性能而破坏了人工智能的可信度。我们确定了以下三个关于泄漏问题的事项：（1）泄漏率：训练数据中泄露数据的占比，（2）输出率：生成泄露数据的容易程度，（3）检测率：检测泄漏数据和非泄漏数据的性能。尽管泄漏率是数据泄漏问题的根源，但它并不清楚它如何影响输出率和检测率。在本文中，我们进行了实验调查，阐明了泄漏率与个人隐私信息、受版权保护的文本和基准数据的可输出率和检测率之间的关系。此外，我们提出了一个自检方法，其中LLMs在训练数据中检测实例是否存在或不存在，与之前不使用显式学习的方法不同。为了探索生成泄漏信息的可行性，我们创建了一个旨在引起LLMs生成个人隐私信息、受版权保护的文本和基准数据的提示的数据集。我们的实验发现，大多数情况下，LLMs都会产生泄漏信息，尽管它们的训练集中包含的这类数据较少。这表明，即使是少量的泄漏数据也会对输出产生重大影响。我们的自检方法在现有检测方法中显示出卓越的性能。

URL

https://arxiv.org/abs/2403.16139

PDF

https://arxiv.org/pdf/2403.16139.pdf