Paper Reading AI Learner

OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model

2024-04-16 06:39:37
Runyi Li, Xuhan Sheng, Weiqi Li, Jian Zhang

Abstract

Omnidirectional images (ODIs) are commonly used in real-world visual tasks, and high-resolution ODIs help improve the performance of related visual tasks. Most existing super-resolution methods for ODIs use end-to-end learning strategies, resulting in inferior realness of generated images and a lack of effective out-of-domain generalization capabilities in training methods. Image generation methods represented by diffusion model provide strong priors for visual tasks and have been proven to be effectively applied to image restoration tasks. Leveraging the image priors of the Stable Diffusion (SD) model, we achieve omnidirectional image super-resolution with both fidelity and realness, dubbed as OmniSSR. Firstly, we transform the equirectangular projection (ERP) images into tangent projection (TP) images, whose distribution approximates the planar image domain. Then, we use SD to iteratively sample initial high-resolution results. At each denoising iteration, we further correct and update the initial results using the proposed Octadecaplex Tangent Information Interaction (OTII) and Gradient Decomposition (GD) technique to ensure better consistency. Finally, the TP images are transformed back to obtain the final high-resolution results. Our method is zero-shot, requiring no training or fine-tuning. Experiments of our method on two benchmark datasets demonstrate the effectiveness of our proposed method.

Abstract (translated)

定向图像(ODIs)通常用于现实世界的视觉任务,而高分辨率ODIs有助于提高相关视觉任务的性能。大多数现有的超分辨率方法ODIs都使用端到端学习策略,导致生成的图像的现实性较差,训练方法中缺乏有效的跨域通用能力。代表扩散模型的图像生成方法具有很强的对视觉任务的优先级,已经被证明有效地应用于图像修复任务。通过利用Stable Diffusion(SD)模型的图像先验,我们实现了一种既有保真度又有真实感的 omnidirectional 图像超分辨率,被称为OmniSSR。首先,我们将等角投影(ERP)图像转换为切线投影(TP)图像,其分布近似于平面图像域。然后,我们使用SD逐迭代采样初始高分辨率结果。在每一次去噪迭代中,我们进一步使用所提出的八面体切线信息交互(OTII)和梯度分解(GD)技术纠正和更新初始结果,确保更好的一致性。最后,TP图像转换为获得最终高分辨率结果。我们的方法是零散的,不需要训练或微调。在两个基准数据集上的实验表明,我们提出的方法的有效性。

URL

https://arxiv.org/abs/2404.10312

PDF

https://arxiv.org/pdf/2404.10312.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot