Paper Reading AI Learner

TexSliders: Diffusion-Based Texture Editing in CLIP Space

2024-05-01 17:57:21
Julia Guerrero-Viu, Milos Hasan, Arthur Roullier, Midhun Harikumar, Yiwei Hu, Paul Guerrero, Diego Gutierrez, Belen Masia, Valentin Deschaintre

Abstract

Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.

Abstract (translated)

生成模型已经使得使用自然语言在图像上进行直观的创作和编辑。特别是,扩散模型最近在自然图像编辑方面取得了显著的成果。在这项工作中,我们提出将扩散技术应用于编辑纹理,纹理是3D内容创建流程的重要组成部分。我们分析现有的编辑方法,并表明它们不适用于纹理,因为它们的共同基础方法——操纵关注图——不适用于纹理领域。为了解决这个问题,我们提出了一种新颖的方法,即通过调控CLIP图像嵌入来控制扩散生成。我们使用简单的文本提示(例如," aged wood " 到 " new wood ")定义编辑方向,并使用纹理先验将这些方向映射到CLIP图像嵌入空间,采用基于采样的方法,在CLIP空间中给出与纹理属性无关的身份保持方向。为了进一步提高身份保留,我们将这些方向投影到CLIP子空间中,该子空间最小化由纠缠纹理属性引起的身份变化。我们的编辑流程使用自然语言提示仅创建任意滑块,无需标注的地面真数据。

URL

https://arxiv.org/abs/2405.00672

PDF

https://arxiv.org/pdf/2405.00672.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot