Paper Reading AI Learner

Identification of Fine-grained Systematic Errors via Controlled Scene Generation

2024-04-10 14:35:22
Valentyn Boreiko, Matthias Hein, Jan Hendrik Metzen

Abstract

Many safety-critical applications, especially in autonomous driving, require reliable object detectors. They can be very effectively assisted by a method to search for and identify potential failures and systematic errors before these detectors are deployed. Systematic errors are characterized by combinations of attributes such as object location, scale, orientation, and color, as well as the composition of their respective backgrounds. To identify them, one must rely on something other than real images from a test set because they do not account for very rare but possible combinations of attributes. To overcome this limitation, we propose a pipeline for generating realistic synthetic scenes with fine-grained control, allowing the creation of complex scenes with multiple objects. Our approach, BEV2EGO, allows for a realistic generation of the complete scene with road-contingent control that maps 2D bird's-eye view (BEV) scene configurations to a first-person view (EGO). In addition, we propose a benchmark for controlled scene generation to select the most appropriate generative outpainting model for BEV2EGO. We further use it to perform a systematic analysis of multiple state-of-the-art object detection models and discover differences between them.

Abstract (translated)

许多关键安全应用程序,尤其是在自动驾驶领域,需要可靠的物体检测器。在部署这些检测器之前,通过一种方法搜索和识别潜在故障和系统误差,可以大大有效地辅助这些检测器。系统误差的特点是包括物体位置、尺寸、方向和颜色的属性,以及它们各自的背景的组合。要识别它们,一个人必须依赖除了测试集中的真实图像之外的其他东西,因为它们没有考虑到很少见但可能出现的属性组合。为了克服这一局限性,我们提出了一个生成真实合成场景的流水线,具有细粒度控制,允许创建具有多个物体的复杂场景。我们的方法BEV2EGO允许通过道路 contingent控制生成完整的场景,将2D鸟类视场(BEV)场景配置映射到第一人称视角(EGO)。此外,我们还提出了一个基准,用于控制场景生成,以选择最合适的生成修复模型为BEV2EGO。我们进一步使用它进行对多个最先进的物体检测模型的系统分析,并发现了它们之间的差异。

URL

https://arxiv.org/abs/2404.07045

PDF

https://arxiv.org/pdf/2404.07045.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot