PromptMix: Text-to-image diffusion models enhance the performance of lightweight networks

Abstract
Abstract (translated)
URL
PDF

Abstract

Many deep learning tasks require annotations that are too time consuming for human operators, resulting in small dataset sizes. This is especially true for dense regression problems such as crowd counting which requires the location of every person in the image to be annotated. Techniques such as data augmentation and synthetic data generation based on simulations can help in such cases. In this paper, we introduce PromptMix, a method for artificially boosting the size of existing datasets, that can be used to improve the performance of lightweight networks. First, synthetic images are generated in an end-to-end data-driven manner, where text prompts are extracted from existing datasets via an image captioning deep network, and subsequently introduced to text-to-image diffusion models. The generated images are then annotated using one or more high-performing deep networks, and mixed with the real dataset for training the lightweight network. By extensive experiments on five datasets and two tasks, we show that PromptMix can significantly increase the performance of lightweight networks by up to 26%.

Abstract (translated)

许多深度学习任务需要人工标注,这对人类操作员来说太花时间了,导致数据集规模较小。特别是对于密度回归问题,例如人群计数,需要每个图像中的每个人的位置进行标注。基于模拟的技术,如数据增强和基于模拟的合成数据生成可以帮助在这种情况下解决问题。在本文中,我们介绍了Prompt Mix方法,一种人工增加现有数据集规模的方法,可以用于提高轻量级网络的性能。首先,通过端到端的数据驱动方式生成合成图像,其中从现有数据集中提取文本提示并通过图像标注深度学习网络引入文本到图像扩散模型。生成的图像然后用一个或多个高性能深度学习网络进行标注,并与真实的数据集混合用于训练轻量级网络。通过在五个数据集和两个任务上进行广泛的实验,我们表明Prompt Mix可以显著提高轻量级网络的性能,最多提高了26%。

URL

https://arxiv.org/abs/2301.12914

PDF

https://arxiv.org/pdf/2301.12914.pdf