Abstract
Many safety-critical applications, especially in autonomous driving, require reliable object detectors. They can be very effectively assisted by a method to search for and identify potential failures and systematic errors before these detectors are deployed. Systematic errors are characterized by combinations of attributes such as object location, scale, orientation, and color, as well as the composition of their respective backgrounds. To identify them, one must rely on something other than real images from a test set because they do not account for very rare but possible combinations of attributes. To overcome this limitation, we propose a pipeline for generating realistic synthetic scenes with fine-grained control, allowing the creation of complex scenes with multiple objects. Our approach, BEV2EGO, allows for a realistic generation of the complete scene with road-contingent control that maps 2D bird's-eye view (BEV) scene configurations to a first-person view (EGO). In addition, we propose a benchmark for controlled scene generation to select the most appropriate generative outpainting model for BEV2EGO. We further use it to perform a systematic analysis of multiple state-of-the-art object detection models and discover differences between them.
Abstract (translated)
许多关键安全应用程序,尤其是在自动驾驶领域,需要可靠的物体检测器。在部署这些检测器之前,通过一种方法搜索和识别潜在故障和系统误差,可以大大有效地辅助这些检测器。系统误差的特点是包括物体位置、尺寸、方向和颜色的属性,以及它们各自的背景的组合。要识别它们,一个人必须依赖除了测试集中的真实图像之外的其他东西,因为它们没有考虑到很少见但可能出现的属性组合。为了克服这一局限性,我们提出了一个生成真实合成场景的流水线,具有细粒度控制,允许创建具有多个物体的复杂场景。我们的方法BEV2EGO允许通过道路 contingent控制生成完整的场景,将2D鸟类视场(BEV)场景配置映射到第一人称视角(EGO)。此外,我们还提出了一个基准,用于控制场景生成,以选择最合适的生成修复模型为BEV2EGO。我们进一步使用它进行对多个最先进的物体检测模型的系统分析,并发现了它们之间的差异。
URL
https://arxiv.org/abs/2404.07045