Abstract
While recent advancements in deep neural networks (DNNs) have substantially enhanced visual AI's capabilities, the challenge of inadequate data diversity and volume remains, particularly in construction domain. This study presents a novel image synthesis methodology tailored for construction worker detection, leveraging the generative-AI platform Midjourney. The approach entails generating a collection of 12,000 synthetic images by formulating 3000 different prompts, with an emphasis on image realism and diversity. These images, after manual labeling, serve as a dataset for DNN training. Evaluation on a real construction image dataset yielded promising results, with the model attaining average precisions (APs) of 0.937 and 0.642 at intersection-over-union (IoU) thresholds of 0.5 and 0.5 to 0.95, respectively. Notably, the model demonstrated near-perfect performance on the synthetic dataset, achieving APs of 0.994 and 0.919 at the two mentioned thresholds. These findings reveal both the potential and weakness of generative AI in addressing DNN training data scarcity.
Abstract (translated)
尽管深度神经网络(DNN)的近期进展大幅提升了视觉人工智能的能力,但在建筑领域的数据多样性和数量不足的问题依然存在。本研究提出了一种针对建筑工人检测的新颖图像合成方法,利用生成式AI平台Midjourney进行实施。该方法通过制定3000个不同的提示来生成一组共12,000张合成图像,并强调图像的真实感和多样性。这些经过人工标注的图像被用作DNN训练的数据集。在实际建筑图像数据集上的评估显示,模型取得了令人鼓舞的结果,在交并比(IoU)阈值为0.5和从0.5到0.95时,平均精度(APs)分别为0.937和0.642。值得注意的是,该模型在合成数据集上表现接近完美,在上述两个阈值下的APs分别为0.994和0.919。这些发现揭示了生成式AI在解决DNN训练数据稀缺问题上的潜力与不足。
URL
https://arxiv.org/abs/2507.13221