Paper Reading AI Learner

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

2024-03-04 14:36:56
Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

Abstract

Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with complex post-process operations, ResAdapter directly generates images with the dynamical resolution. Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter with only 0.5M can process images with flexible resolutions for arbitrary diffusion models. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for image generation across a broad range of resolutions, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images. Project link is this https URL

Abstract (translated)

近年来,文本到图像模型(例如,Stable Diffusion)及其相应个性化技术(例如,DreamBooth 和 LoRA)的进步使得个人能够生成高质量和富有想象力的图像。然而,在生成训练领域之外的高分辨率图像时,它们往往存在局限性。为了克服这一局限,我们提出了分辨率适配器(ResAdapter),一种针对扩散模型的领域一致的适配器,用于生成无限制分辨率和平衡比的图像。与其他多分辨率生成方法不同,ResAdapter直接生成具有动态分辨率的图像。特别是在学习了对纯分辨率 prior 的深刻理解后,ResAdapter在训练通用数据集的同时,使用个性化的扩散模型生成具有个人风格域的分辨率无限制的图像。全面的实验证明,ResAdapter仅需0.5M即可处理任意扩散模型的灵活分辨率图像。更广泛的实验证明,ResAdapter与其他模块(例如,控制网、IP-适配器和LCM-LoRA)在各种分辨率范围内生成图像兼容,并可以集成到其他多分辨率模型(例如,ElasticDiffusion)中,以高效生成高分辨率图像。项目链接是:<https://www.projectlink.io/project/resadapter>

URL

https://arxiv.org/abs/2403.02084

PDF

https://arxiv.org/pdf/2403.02084.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot