Paper Reading AI Learner

Parametric Shadow Control for Portrait Generationin Text-to-Image Diffusion Models

2025-03-27 19:42:52
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler

Abstract

Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.

Abstract (translated)

文本到图像的扩散模型在生成多样的肖像方面表现出色,但缺乏直观的阴影控制。现有的编辑方法作为后处理手段,在处理不同风格时难以提供有效的操作。此外,这些方法要么依赖于昂贵的真实世界光舞台数据收集,要么需要大量的计算资源进行训练。为了解决这些问题,我们介绍了Shadow Director方法,该方法可以从已经训练好的扩散模型中提取并操纵隐藏的阴影属性。我们的方法使用一个小型估计网络,只需要几千张合成图像和几个小时的训练时间——无需昂贵的真实世界光舞台数据。 Shadow Director在生成肖像时提供了参数化且直观的阴影形状、位置及强度控制,并能在保持艺术完整性和身份一致性的前提下应用于各种风格中。尽管仅基于真实世界的身份构建并经过少量合成数据训练,它仍然能够有效地推广到具有多样风格的生成肖像上,使其成为一个更易于使用和资源友好的解决方案。

URL

https://arxiv.org/abs/2503.21943

PDF

https://arxiv.org/pdf/2503.21943.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot