Abstract
We present a method for large-mask pluralistic image inpainting based on the generative framework of discrete latent codes. Our method learns latent priors, discretized as tokens, by only performing computations at the visible locations of the image. This is realized by a restrictive partial encoder that predicts the token label for each visible block, a bidirectional transformer that infers the missing labels by only looking at these tokens, and a dedicated synthesis network that couples the tokens with the partial image priors to generate coherent and pluralistic complete image even under extreme mask settings. Experiments on public benchmarks validate our design choices as the proposed method outperforms strong baselines in both visual quality and diversity metrics.
Abstract (translated)
我们提出了一种基于离散潜在码生成框架的大mask多模态图像修复方法。我们的方法通过仅在图像可见位置进行计算来学习潜在先验,这些先验用标记表示。这通过一个约束性的部分编码器来实现,该编码器预测每个可见块的标记标签,一个双向Transformer,通过仅观察这些标记来推断缺失的标签,和一个专用的合成网络来实现,该网络将标记与部分图像先验耦合,以便在极端的mask设置下生成连贯和多模态完整的图像。在公开基准测试上进行的实验证实了我们设计选择的有效性,因为与强大的基线相比,所提出的方法在视觉质量和多样性度量方面都表现出色。
URL
https://arxiv.org/abs/2403.18186