Paper Reading AI Learner

CAT: Contrastive Adapter Training for Personalized Image Generation

2024-04-11 08:36:13
Jae Wan Park, Sang Hyun Park, Jun Young Koh, Junha Lee, Min Song

Abstract

The emergence of various adapters, including Low-Rank Adaptation (LoRA) applied from the field of natural language processing, has allowed diffusion models to personalize image generation at a low cost. However, due to the various challenges including limited datasets and shortage of regularization and computation resources, adapter training often results in unsatisfactory outcomes, leading to the corruption of the backbone model's prior knowledge. One of the well known phenomena is the loss of diversity in object generation, especially within the same class which leads to generating almost identical objects with minor variations. This poses challenges in generation capabilities. To solve this issue, we present Contrastive Adapter Training (CAT), a simple yet effective strategy to enhance adapter training through the application of CAT loss. Our approach facilitates the preservation of the base model's original knowledge when the model initiates adapters. Furthermore, we introduce the Knowledge Preservation Score (KPS) to evaluate CAT's ability to keep the former information. We qualitatively and quantitatively compare CAT's improvement. Finally, we mention the possibility of CAT in the aspects of multi-concept adapter and optimization.

Abstract (translated)

各种适配器的出现,包括自然语言处理领域中的低秩适应(LoRA)应用,使得扩散模型在低成本下可以个性化图像生成。然而,由于各种挑战,包括有限的数据集和缺乏 regularization 和计算资源,适配器训练通常导致不满意的成果,导致基础知识模型的先验知识受到污染。一个著名的现象是在同一类别的物体生成中失去了多样性,尤其是在同一类中生成几乎相同的物体,给生成能力带来了挑战。为了解决这个问题,我们提出了 Contrastive Adapter Training (CAT),一种简单而有效的策略通过应用 CAT损失来增强适配器训练。我们的方法在模型启动适配器时保留基础模型的原始知识。此外,我们还引入了知识保留分数(KPS)来评估 CAT保留先信息的能力。我们定性和定量地比较了 CAT的改进。最后,我们还提到了 CAT在多概念适配器和优化方面的可能性。

URL

https://arxiv.org/abs/2404.07554

PDF

https://arxiv.org/pdf/2404.07554.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot