Abstract
This paper is about effectively utilizing synthetic data for training deep neural networks for industrial parts classification, in particular, by taking into account the domain gap against real-world images. To this end, we introduce a synthetic dataset that may serve as a preliminary testbed for the Sim-to-Real challenge; it contains 17 objects of six industrial use cases, including isolated and assembled parts. A few subsets of objects exhibit large similarities in shape and albedo for reflecting challenging cases of industrial parts. All the sample images come with and without random backgrounds and post-processing for evaluating the importance of domain randomization. We call it Synthetic Industrial Parts dataset (SIP-17). We study the usefulness of SIP-17 through benchmarking the performance of five state-of-the-art deep network models, supervised and self-supervised, trained only on the synthetic data while testing them on real data. By analyzing the results, we deduce some insights on the feasibility and challenges of using synthetic data for industrial parts classification and for further developing larger-scale synthetic datasets. Our dataset and code are publicly available.
Abstract (translated)
本文旨在有效地利用合成数据来训练工业部件分类深度神经网络,特别是考虑到领域差距与现实世界的图像。为此,我们引入了一个合成数据集,可以作为 Sim-to-Real 挑战的前期测试bed;它包含六个工业用例中的17个对象,包括隔离和组装部件。少数对象在形状和 albedo 方面具有很大的相似性,反映了工业部件具有挑战性的情况。所有的样本图像都带有一定的随机背景和后处理,用于评估领域随机化的重要性。我们称之为合成工业部件数据集(SIP-17)。我们通过比较使用 SIP-17 对五个最先进的深度网络模型的性能,包括监督和自监督模型,仅在合成数据上训练,然后在真实数据上测试,来研究 SIP-17 的实用性。通过分析结果,我们得出了一些关于使用合成数据进行工业部件分类的可行性和挑战性的见解,以及进一步开发更大规模合成数据集的挑战和思考。我们的数据集和代码都是公开可用的。
URL
https://arxiv.org/abs/2404.08778