Abstract
The rapid expansion of large-scale text-to-image diffusion models has raised growing concerns regarding their potential misuse in creating harmful or misleading content. In this paper, we introduce MACE, a finetuning framework for the task of mass concept erasure. This task aims to prevent models from generating images that embody unwanted concepts when prompted. Existing concept erasure methods are typically restricted to handling fewer than five concepts simultaneously and struggle to find a balance between erasing concept synonyms (generality) and maintaining unrelated concepts (specificity). In contrast, MACE differs by successfully scaling the erasure scope up to 100 concepts and by achieving an effective balance between generality and specificity. This is achieved by leveraging closed-form cross-attention refinement along with LoRA finetuning, collectively eliminating the information of undesirable concepts. Furthermore, MACE integrates multiple LoRAs without mutual interference. We conduct extensive evaluations of MACE against prior methods across four different tasks: object erasure, celebrity erasure, explicit content erasure, and artistic style erasure. Our results reveal that MACE surpasses prior methods in all evaluated tasks. Code is available at this https URL.
Abstract (translated)
大规模文本到图像扩散模型的快速扩张引发了对它们在创建有害或误导性内容方面的潜在滥用担忧。在本文中,我们介绍了MACE,一个用于大型概念消除任务的微调框架。该任务旨在防止模型在提示下生成具有不良概念的图像。现有的概念消除方法通常仅处理五个或更少的概念,并且很难在消除概念同义词(普遍性)和保持无关概念(特异性)之间找到平衡。相比之下,MACE通过成功将消除范围扩展到100个概念,并实现普遍性和特异性之间的有效平衡,实现了这一目标。这是通过利用闭式形式跨注意力和LoRA微调来共同消除不良概念信息实现的。此外,MACE集成了多个LoRA,没有相互干扰。我们对MACE与先前方法在四个不同任务上的表现进行了广泛评估:对象消除、名人消除、明确内容消除和艺术风格消除。我们的结果表明,MACE在所有评估任务上都超过了先前方法。代码可在此处访问:https:// this URL.
URL
https://arxiv.org/abs/2403.06135