Paper Reading AI Learner

A survey of synthetic data augmentation methods in computer vision

2024-03-15 07:34:08
Alhassan Mumuni, Fuseini Mumuni, Nana Kobina Gerrar

Abstract

The standard approach to tackling computer vision problems is to train deep convolutional neural network (CNN) models using large-scale image datasets which are representative of the target task. However, in many scenarios, it is often challenging to obtain sufficient image data for the target task. Data augmentation is a way to mitigate this challenge. A common practice is to explicitly transform existing images in desired ways so as to create the required volume and variability of training data necessary to achieve good generalization performance. In situations where data for the target domain is not accessible, a viable workaround is to synthesize training data from scratch--i.e., synthetic data augmentation. This paper presents an extensive review of synthetic data augmentation techniques. It covers data synthesis approaches based on realistic 3D graphics modeling, neural style transfer (NST), differential neural rendering, and generative artificial intelligence (AI) techniques such as generative adversarial networks (GANs) and variational autoencoders (VAEs). For each of these classes of methods, we focus on the important data generation and augmentation techniques, general scope of application and specific use-cases, as well as existing limitations and possible workarounds. Additionally, we provide a summary of common synthetic datasets for training computer vision models, highlighting the main features, application domains and supported tasks. Finally, we discuss the effectiveness of synthetic data augmentation methods. Since this is the first paper to explore synthetic data augmentation methods in great detail, we are hoping to equip readers with the necessary background information and in-depth knowledge of existing methods and their attendant issues.

Abstract (translated)

解决计算机视觉问题的标准方法是使用大型图像数据集训练深度卷积神经网络(CNN)模型,这些数据集代表目标任务。然而,在许多情况下,获得足够的目标任务图像数据具有挑战性。数据增强是一种减轻这一挑战的方法。一种常见的做法是对现有的图像进行显式转换,以便创建实现良好泛化性能所需的训练数据量。在目标领域数据不可访问的情况下,一个可行的解决方法是从零开始合成训练数据,即合成数据增强。 本文对合成数据增强技术进行了全面的回顾。它涵盖了基于现实3D图形建模的数据生成方法、神经风格迁移(NST)、差分神经渲染和生成人工智能(AI)技术(如生成对抗网络(GANs)和变分自编码器(VAEs)的数据生成方法。对于每种方法,我们重点关注重要的数据生成和增强技术、应用范围和具体用例,以及现有的局限性和可能的解决方案。此外,我们还提供了用于训练计算机视觉模型的常见合成数据集的总结,突出了主要特点、应用领域和支持任务。最后,我们讨论了合成数据增强方法的有效性。由于这是对详细探索合成数据增强方法的第一篇论文,我们希望能够为读者提供必要的背景信息和现有方法的深入知识及其相关问题。

URL

https://arxiv.org/abs/2403.10075

PDF

https://arxiv.org/pdf/2403.10075.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot