Paper Reading AI Learner

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

2024-02-28 13:50:46
Ziying Pan, Kun Wang, Gang Li, Feihong He, Xiwang Li, Yongxuan Lai

Abstract

The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, called FineDiffusion, to fine-tune large pre-trained diffusion models scaling to large-scale fine-grained image generation with 10,000 categories. FineDiffusion significantly accelerates training and reduces storage overhead by only fine-tuning tiered class embedder, bias terms, and normalization layers' parameters. To further improve the image generation quality of fine-grained categories, we propose a novel sampling method for fine-grained image generation, which utilizes superclass-conditioned guidance, specifically tailored for fine-grained categories, to replace the conventional classifier-free guidance sampling. Compared to full fine-tuning, FineDiffusion achieves a remarkable 1.56x training speed-up and requires storing merely 1.77% of the total model parameters, while achieving state-of-the-art FID of 9.776 on image generation of 10,000 classes. Extensive qualitative and quantitative experiments demonstrate the superiority of our method compared to other parameter-efficient fine-tuning methods. The code and more generated results are available at our project website: this https URL.

Abstract (translated)

基于扩散模型的类条件图像生成以生成高质量和多样化的图像而闻名。然而,大多数努力都集中在为一般类别生成图像上,例如ImageNet-1k中的1000个类别。一个更具挑战性的任务是大规模细粒度图像生成,这是我们需要探索的边界。在这篇工作中,我们提出了一个参数高效的策略,称为FineDiffusion,用于在具有10,000个类别的较大预训练扩散模型上进行微调,实现大规模细粒度图像生成。FineDiffusion通过仅微调级联分类器、偏置层和归一化层的参数,显著加速训练并减少了存储开销。为了进一步提高细粒度类别的图像生成质量,我们提出了一个新的细粒度图像生成采样方法,利用超类条件指导,特别是为细粒度类别定制的,以取代传统的分类器无指导采样。与完全微调相比,FineDiffusion实现了训练速度的1.56倍,仅需要存储总模型的1.77%的参数,同时实现了与细粒度类别生成图像的FID达到9.776的 state-of-the-art水平。大量的定性和定量实验证明了我们的方法与其他参数高效的细粒度调整方法相比具有优越性。代码和更多生成结果可以在我们的项目网站上查看:https://this URL。

URL

https://arxiv.org/abs/2402.18331

PDF

https://arxiv.org/pdf/2402.18331.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot