Abstract
Generative models have enabled intuitive image creation and manipulation using natural language. In particular, diffusion models have recently shown remarkable results for natural image editing. In this work, we propose to apply diffusion techniques to edit textures, a specific class of images that are an essential part of 3D content creation pipelines. We analyze existing editing methods and show that they are not directly applicable to textures, since their common underlying approach, manipulating attention maps, is unsuitable for the texture domain. To address this, we propose a novel approach that instead manipulates CLIP image embeddings to condition the diffusion generation. We define editing directions using simple text prompts (e.g., "aged wood" to "new wood") and map these to CLIP image embedding space using a texture prior, with a sampling-based approach that gives us identity-preserving directions in CLIP space. To further improve identity preservation, we project these directions to a CLIP subspace that minimizes identity variations resulting from entangled texture attributes. Our editing pipeline facilitates the creation of arbitrary sliders using natural language prompts only, with no ground-truth annotated data necessary.
Abstract (translated)
生成模型已经使得使用自然语言在图像上进行直观的创作和编辑。特别是,扩散模型最近在自然图像编辑方面取得了显著的成果。在这项工作中,我们提出将扩散技术应用于编辑纹理,纹理是3D内容创建流程的重要组成部分。我们分析现有的编辑方法,并表明它们不适用于纹理,因为它们的共同基础方法——操纵关注图——不适用于纹理领域。为了解决这个问题,我们提出了一种新颖的方法,即通过调控CLIP图像嵌入来控制扩散生成。我们使用简单的文本提示(例如," aged wood " 到 " new wood ")定义编辑方向,并使用纹理先验将这些方向映射到CLIP图像嵌入空间,采用基于采样的方法,在CLIP空间中给出与纹理属性无关的身份保持方向。为了进一步提高身份保留,我们将这些方向投影到CLIP子空间中,该子空间最小化由纠缠纹理属性引起的身份变化。我们的编辑流程使用自然语言提示仅创建任意滑块,无需标注的地面真数据。
URL
https://arxiv.org/abs/2405.00672