Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

2021-12-01 10:08:24

Woncheol Shin, Gyubok Lee, Jiyoung Lee, Joonseok Lee, Edward Choi

arXiv_CV

arXiv_CV Image_Caption GAN Embedding Text_Generation Pose

Abstract
Abstract (translated)
URL
PDF

Abstract

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

Abstract (translated)

URL

https://arxiv.org/abs/2112.00384

PDF

https://arxiv.org/pdf/2112.00384.pdf

Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Abstract

Abstract (translated)

URL

PDF Copy

PDF