Paper Reading AI Learner

A Little Leak Will Sink a Great Ship: Survey of Transparency for Large Language Models from Start to Finish

2024-03-24 13:21:58
Masahiro Kaneko, Timothy Baldwin

Abstract

Large Language Models (LLMs) are trained on massive web-crawled corpora. This poses risks of leakage, including personal information, copyrighted texts, and benchmark datasets. Such leakage leads to undermining human trust in AI due to potential unauthorized generation of content or overestimation of performance. We establish the following three criteria concerning the leakage issues: (1) leakage rate: the proportion of leaked data in training data, (2) output rate: the ease of generating leaked data, and (3) detection rate: the detection performance of leaked versus non-leaked data. Despite the leakage rate being the origin of data leakage issues, it is not understood how it affects the output rate and detection rate. In this paper, we conduct an experimental survey to elucidate the relationship between the leakage rate and both the output rate and detection rate for personal information, copyrighted texts, and benchmark data. Additionally, we propose a self-detection approach that uses few-shot learning in which LLMs detect whether instances are present or absent in their training data, in contrast to previous methods that do not employ explicit learning. To explore the ease of generating leaked information, we create a dataset of prompts designed to elicit personal information, copyrighted text, and benchmarks from LLMs. Our experiments reveal that LLMs produce leaked information in most cases despite less such data in their training set. This indicates even small amounts of leaked data can greatly affect outputs. Our self-detection method showed superior performance compared to existing detection methods.

Abstract (translated)

大语言模型(LLMs)通过训练于大规模爬取的语料库进行训练。这可能导致泄露风险,包括个人隐私信息、受版权保护的文本和基准数据。这种泄露导致由于可能未经授权生成内容或高估性能而破坏了人工智能的可信度。我们确定了以下三个关于泄漏问题的事项:(1)泄漏率:训练数据中泄露数据的占比,(2)输出率:生成泄露数据的容易程度,(3)检测率:检测泄漏数据和非泄漏数据的性能。尽管泄漏率是数据泄漏问题的根源,但它并不清楚它如何影响输出率和检测率。在本文中,我们进行了实验调查,阐明了泄漏率与个人隐私信息、受版权保护的文本和基准数据的可输出率和检测率之间的关系。此外,我们提出了一个自检方法,其中LLMs在训练数据中检测实例是否存在或不存在,与之前不使用显式学习的方法不同。为了探索生成泄漏信息的可行性,我们创建了一个旨在引起LLMs生成个人隐私信息、受版权保护的文本和基准数据的提示的数据集。我们的实验发现,大多数情况下,LLMs都会产生泄漏信息,尽管它们的训练集中包含的这类数据较少。这表明,即使是少量的泄漏数据也会对输出产生重大影响。我们的自检方法在现有检测方法中显示出卓越的性能。

URL

https://arxiv.org/abs/2403.16139

PDF

https://arxiv.org/pdf/2403.16139.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot