Paper Reading AI Learner

2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction

2024-09-16 04:01:10
Atsuya Nakata, Takao Yamanaka

Abstract

Omni-directional images have been increasingly used in various applications, including virtual reality and SNS (Social Networking Services). However, their availability is comparatively limited in contrast to normal field of view (NFoV) images, since specialized cameras are required to take omni-directional images. Consequently, several methods have been proposed based on generative adversarial networks (GAN) to synthesize omni-directional images, but these approaches have shown difficulties in training of the models, due to instability and/or significant time consumption in the training. To address these problems, this paper proposes a novel omni-directional image synthesis method, 2S-ODIS (Two-Stage Omni-Directional Image Synthesis), which generated high-quality omni-directional images but drastically reduced the training time. This was realized by utilizing the VQGAN (Vector Quantized GAN) model pre-trained on a large-scale NFoV image database such as ImageNet without fine-tuning. Since this pre-trained model does not represent distortions of omni-directional images in the equi-rectangular projection (ERP), it cannot be applied directly to the omni-directional image synthesis in ERP. Therefore, two-stage structure was adopted to first create a global coarse image in ERP and then refine the image by integrating multiple local NFoV images in the higher resolution to compensate the distortions in ERP, both of which are based on the pre-trained VQGAN model. As a result, the proposed method, 2S-ODIS, achieved the reduction of the training time from 14 days in OmniDreamer to four days in higher image quality.

Abstract (translated)

越来越多的应用程序包括虚拟现实和社交网络服务(SNS)中使用全向量图像(Omni-directional images)。然而,与普通场视野(NFoV)图像相比,它们的可用性相对有限,因为需要专用相机才能拍摄全向量图像。因此,基于生成对抗网络(GAN)提出了几种方法来合成全向量图像,但这些方法在训练模型时遇到了困难,因为训练过程中存在不稳定和/或显著的延迟。为了应对这些问题,本文提出了一种新颖的全向量图像合成方法:2S-ODIS(两阶段全向量图像合成),它生成了高质量的全向量图像,但显著减少了训练时间。这是通过利用预训练的大型NFoV图像数据库ImageNet,而无需对模型进行微调来实现的。由于这个预训练模型没有在ERP上表示全向量图像的失真,因此它不能直接应用于ERP的全向量图像合成。因此,采用了两个阶段的结构,首先在ERP上创建全局粗图像,然后通过在更高分辨率中整合多个局部NFoV图像来优化图像,这都是基于预训练的VQGAN模型。因此,与OmniDreamer相比,所提出的方法2S-ODIS将训练时间从14天减少到4天,实现了高图像质量的全向量图像合成。

URL

https://arxiv.org/abs/2409.09969

PDF

https://arxiv.org/pdf/2409.09969.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot