Abstract
Omni-directional images have been increasingly used in various applications, including virtual reality and SNS (Social Networking Services). However, their availability is comparatively limited in contrast to normal field of view (NFoV) images, since specialized cameras are required to take omni-directional images. Consequently, several methods have been proposed based on generative adversarial networks (GAN) to synthesize omni-directional images, but these approaches have shown difficulties in training of the models, due to instability and/or significant time consumption in the training. To address these problems, this paper proposes a novel omni-directional image synthesis method, 2S-ODIS (Two-Stage Omni-Directional Image Synthesis), which generated high-quality omni-directional images but drastically reduced the training time. This was realized by utilizing the VQGAN (Vector Quantized GAN) model pre-trained on a large-scale NFoV image database such as ImageNet without fine-tuning. Since this pre-trained model does not represent distortions of omni-directional images in the equi-rectangular projection (ERP), it cannot be applied directly to the omni-directional image synthesis in ERP. Therefore, two-stage structure was adopted to first create a global coarse image in ERP and then refine the image by integrating multiple local NFoV images in the higher resolution to compensate the distortions in ERP, both of which are based on the pre-trained VQGAN model. As a result, the proposed method, 2S-ODIS, achieved the reduction of the training time from 14 days in OmniDreamer to four days in higher image quality.
Abstract (translated)
越来越多的应用程序包括虚拟现实和社交网络服务(SNS)中使用全向量图像(Omni-directional images)。然而,与普通场视野(NFoV)图像相比,它们的可用性相对有限,因为需要专用相机才能拍摄全向量图像。因此,基于生成对抗网络(GAN)提出了几种方法来合成全向量图像,但这些方法在训练模型时遇到了困难,因为训练过程中存在不稳定和/或显著的延迟。为了应对这些问题,本文提出了一种新颖的全向量图像合成方法:2S-ODIS(两阶段全向量图像合成),它生成了高质量的全向量图像,但显著减少了训练时间。这是通过利用预训练的大型NFoV图像数据库ImageNet,而无需对模型进行微调来实现的。由于这个预训练模型没有在ERP上表示全向量图像的失真,因此它不能直接应用于ERP的全向量图像合成。因此,采用了两个阶段的结构,首先在ERP上创建全局粗图像,然后通过在更高分辨率中整合多个局部NFoV图像来优化图像,这都是基于预训练的VQGAN模型。因此,与OmniDreamer相比,所提出的方法2S-ODIS将训练时间从14天减少到4天,实现了高图像质量的全向量图像合成。
URL
https://arxiv.org/abs/2409.09969