Abstract
Training of autonomous driving systems requires extensive datasets with precise annotations to attain robust performance. Human annotations suffer from imperfections, and multiple iterations are often needed to produce high-quality datasets. However, manually reviewing large datasets is laborious and expensive. In this paper, we introduce AutoVDC (Automated Vision Data Cleaning) framework and investigate the utilization of Vision-Language Models (VLMs) to automatically identify erroneous annotations in vision datasets, thereby enabling users to eliminate these errors and enhance data quality. We validate our approach using the KITTI and nuImages datasets, which contain object detection benchmarks for autonomous driving. To test the effectiveness of AutoVDC, we create dataset variants with intentionally injected erroneous annotations and observe the error detection rate of our approach. Additionally, we compare the detection rates using different VLMs and explore the impact of VLM fine-tuning on our pipeline. The results demonstrate our method's high performance in error detection and data cleaning experiments, indicating its potential to significantly improve the reliability and accuracy of large-scale production datasets in autonomous driving.
Abstract (translated)
自动驾驶系统训练需要大量的数据集,这些数据集中含有精确标注以实现稳健性能。人工标注存在不完美之处,并且往往需要多次迭代才能生成高质量的数据集。然而,手动审查大规模数据集既费时又昂贵。在本文中,我们介绍了AutoVDC(自动化视觉数据清洗)框架,并探讨了利用视觉语言模型(VLMs)自动识别视觉数据集中错误注释的方法,从而让用户能够消除这些错误并提高数据质量。我们使用KITTI和nuImages数据集验证我们的方法,这两个数据集包含用于自动驾驶的对象检测基准。 为了测试AutoVDC的有效性,我们在故意注入了错误注释的数据集变体上进行了实验,并观察了我们的方法在错误检测率方面的表现。此外,我们还比较了不同视觉语言模型的检测率,并探讨了对视觉语言模型微调对我们流程的影响。结果表明,在错误检测和数据清理实验中,我们的方法表现出色,预示着它有潜力显著提高大规模生产数据集在自动驾驶中的可靠性和准确性。
URL
https://arxiv.org/abs/2507.12414