Paper Reading AI Learner

SIDAR: Synthetic Image Dataset for Alignment & Restoration

2023-05-19 23:32:06
Monika Kwiatkowski, Simon Matern, Olaf Hellwich

Abstract

Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a high number of image correspondences, while only introducing few changes of scenery within the underlying image sequences. Alternative approaches utilize random perspective distortions on existing image data. However, this only provides trivial distortions, lacking the complexity and variance of real-world scenarios. Instead, our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering: images are added as textures onto a plane, then varying lighting conditions, shadows, and occlusions are added to the scene. The scene is rendered from multiple viewpoints, generating perspective distortions more consistent with real-world scenarios, with homographies closely resembling those of camera projections rather than randomized homographies. For each scene, we provide a sequence of distorted images with corresponding occlusion masks, homographies, and ground-truth labels. The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction. Our data generation pipeline is customizable and can be applied to any existing dataset, serving as a data augmentation to further improve the feature learning of any existing method.

Abstract (translated)

图像对齐和图像恢复是经典的计算机视觉任务。然而,仍然缺乏能够提供足够数据来训练和评估全端深度学习模型的数据集。获取图像对齐的基准数据需要复杂的结构-from-Motion方法或光学流系统,但它们往往提供的数据变异较少,即在 underlying 图像序列中只引入少数场景变化。替代的方法利用现有的图像数据中的随机视角扭曲。但是,这只是简单的扭曲,缺乏现实世界场景的复杂性和变异性。相反,我们提出的数据增强方法有助于通过使用3D渲染克服数据匮乏的问题:图像被添加到平面上,然后改变光照条件、阴影和遮挡物到场景上。场景从多个视角渲染,产生与现实世界场景更为一致的视角扭曲,而近邻关系则 closely类似于相机投影的近邻关系,而不是随机生成的近邻关系。对于每个场景,我们提供一组扭曲图像及其对应的遮挡掩模、近邻关系和基准标签。因此,产生的数据集可以作为涉及图像对齐和消除误差的任务的训练和评估集,例如深度近邻关系估计、密集图像匹配、2D集成调整、填充、删除阴影、去噪、内容检索和背景移除。我们的数据生成流程可定制,可以应用于任何现有的数据集,作为数据增强来提高任何现有方法的特征学习。

URL

https://arxiv.org/abs/2305.12036

PDF

https://arxiv.org/pdf/2305.12036.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot