Abstract
With the popularity of large language models (LLMs), undesirable societal problems like misinformation production and academic misconduct have been more severe, making LLM-generated text detection now of unprecedented importance. Although existing methods have made remarkable progress, a new challenge posed by text from privately tuned LLMs remains underexplored. Users could easily possess private LLMs by fine-tuning an open-source one with private corpora, resulting in a significant performance drop of existing detectors in practice. To address this issue, we propose PhantomHunter, an LLM-generated text detector specialized for detecting text from unseen, privately-tuned LLMs. Its family-aware learning framework captures family-level traits shared across the base models and their derivatives, instead of memorizing individual characteristics. Experiments on data from LLaMA, Gemma, and Mistral families show its superiority over 7 baselines and 3 industrial services, with F1 scores of over 96%.
Abstract (translated)
随着大型语言模型(LLMs)的流行,诸如造谣和学术不端等不良社会问题变得更为严重,使得检测由这些模型生成的文本的重要性前所未有地提升。尽管现有方法已经取得了显著进展,但来自私人微调的LLM产生的文本所带来的新挑战仍然未被充分探索。用户可以通过使用私有语料库对开源模型进行微调来轻松拥有自己的私人LLM,这导致现有的检测工具在实际应用中的性能大幅下降。为解决这一问题,我们提出了PhantomHunter,这是一种专门用于检测来自未知的、私人性质调整过的LLMs生成文本的探测器。其家族感知学习框架能够捕捉基础模型及其衍生品之间的家族级特征,而不是记忆个体特性。在由LLaMA、Gemma和Mistral系列提供的数据上的实验表明,它比7种基线方法和3个工业服务表现更优,在F1评分上超过了96%。
URL
https://arxiv.org/abs/2506.15683