Paper Reading AI Learner

BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

2024-04-06 07:53:49
Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

Abstract

Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often yielded human-centric scenes with severe artifacts. We propose BeyondScene, a novel framework that overcomes prior limitations, generating exquisite higher-resolution (over 8K) human-centric scenes with exceptional text-image correspondence and naturalness using existing pretrained diffusion models. BeyondScene employs a staged and hierarchical approach to initially generate a detailed base image focusing on crucial elements in instance creation for multiple humans and detailed descriptions beyond token limit of diffusion model, and then to seamlessly convert the base image to a higher-resolution output, exceeding training image size and incorporating details aware of text and instances via our novel instance-aware hierarchical enlargement process that consists of our proposed high-frequency injected forward diffusion and adaptive joint diffusion. BeyondScene surpasses existing methods in terms of correspondence with detailed text descriptions and naturalness, paving the way for advanced applications in higher-resolution human-centric scene creation beyond the capacity of pretrained diffusion models without costly retraining. Project page: this https URL.

Abstract (translated)

生成具有更高分辨率的人中心场景,具有详细信息和控制仍然是对现有文本到图像扩散模型的挑战。这一挑战源于训练图像大小有限、文本编码器能力有限以及生成涉及多个人的复杂场景的固有难度。虽然现有的方法试图解决训练大小限制,但它们通常产生具有严重伪影的人为中心场景。我们提出BeyondScene,一种新框架,克服了先前的限制,使用现有的预训练扩散模型生成卓越的高分辨率(超过8K)人中心场景,具有出色的文本图像匹配和自然性。BeyondScene采用阶段性和层次结构的方法,首先生成关注多个人类实例创建关键元素的详细基础图像,并超越了扩散模型的token limit,然后平滑地将基础图像转换为高分辨率输出,超过训练图像大小,并利用我们提出的实例感知层次结构扩展过程,其中包含我们提出的频高注入前向扩散和自适应联合扩散,超越了现有的方法在详细文本描述和自然性方面的表现。BeyondScene在详细文本描述和自然性方面超过了现有的方法,为高级应用于高分辨率人中心场景创建打开了道路,而无需进行昂贵的重新训练。项目页面:https://this URL。

URL

https://arxiv.org/abs/2404.04544

PDF

https://arxiv.org/pdf/2404.04544.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot