Paper Reading AI Learner

PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

2023-01-30 14:15:47
Arian Bakhtiarnia, Qi Zhang, Alexandros Iosifidis

Abstract

Many deep learning tasks require annotations that are too time consuming for human operators, resulting in small dataset sizes. This is especially true for dense regression problems such as crowd counting which requires the location of every person in the image to be annotated. Techniques such as data augmentation and synthetic data generation based on simulations can help in such cases. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets, that can be used to improve the performance of lightweight networks. First, synthetic images are generated in an end-to-end data-driven manner, where text prompts are extracted from existing datasets via an image captioning deep network, and subsequently introduced to text-to-image diffusion models. The generated images are then annotated using one or more high-performing deep networks, and mixed with the real dataset for training the lightweight network. By extensive experiments on five datasets and two tasks, we show that PromptMix can significantly increase the performance of lightweight networks by up to 26%.

Abstract (translated)

许多深度学习任务需要人工标注,这对人类操作员来说太花时间了,导致数据集规模较小。特别是对于密度回归问题,例如人群计数,需要每个图像中的每个人的位置进行标注。基于模拟的技术,如数据增强和基于模拟的合成数据生成可以帮助在这种情况下解决问题。在本文中,我们介绍了Prompt Mix方法,一种人工增加现有数据集规模的方法,可以用于提高轻量级网络的性能。首先,通过端到端的数据驱动方式生成合成图像,其中从现有数据集中提取文本提示并通过图像标注深度学习网络引入文本到图像扩散模型。生成的图像然后用一个或多个高性能深度学习网络进行标注,并与真实的数据集混合用于训练轻量级网络。通过在五个数据集和两个任务上进行广泛的实验,我们表明Prompt Mix可以显著提高轻量级网络的性能,最多提高了26%。

URL

https://arxiv.org/abs/2301.12914

PDF

https://arxiv.org/pdf/2301.12914.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot