Paper Reading AI Learner

Pseudo-Labeling Driven Refinement of Benchmark Object Detection Datasets via Analysis of Learning Patterns

2025-06-01 12:57:58
Min Je Kim, Muhammad Munsif, Altaf Hussain, Hikmat Yar, Sung Wook Baik

Abstract

Benchmark object detection (OD) datasets play a pivotal role in advancing computer vision applications such as autonomous driving, and surveillance, as well as in training and evaluating deep learning-based state-of-the-art detection models. Among them, MS-COCO has become a standard benchmark due to its diverse object categories and complex scenes. However, despite its wide adoption, MS-COCO suffers from various annotation issues, including missing labels, incorrect class assignments, inaccurate bounding boxes, duplicate labels, and group labeling inconsistencies. These errors not only hinder model training but also degrade the reliability and generalization of OD models. To address these challenges, we propose a comprehensive refinement framework and present MJ-COCO, a newly re-annotated version of MS-COCO. Our approach begins with loss and gradient-based error detection to identify potentially mislabeled or hard-to-learn samples. Next, we apply a four-stage pseudo-labeling refinement process: (1) bounding box generation using invertible transformations, (2) IoU-based duplicate removal and confidence merging, (3) class consistency verification via expert objects recognizer, and (4) spatial adjustment based on object region activation map analysis. This integrated pipeline enables scalable and accurate correction of annotation errors without manual re-labeling. Extensive experiments were conducted across four validation datasets: MS-COCO, Sama COCO, Objects365, and PASCAL VOC. Models trained on MJ-COCO consistently outperformed those trained on MS-COCO, achieving improvements in Average Precision (AP) and APS metrics. MJ-COCO also demonstrated significant gains in annotation coverage: for example, the number of small object annotations increased by more than 200,000 compared to MS-COCO.

Abstract (translated)

基准目标检测(OD)数据集在推进自动驾驶和监控等计算机视觉应用以及训练和评估基于深度学习的最新检测模型方面发挥着关键作用。其中,MS-COCO 因其多样化的对象类别和复杂的场景而成为标准基准。然而,尽管 MS-COCO 广泛采用,但它仍存在各种标注问题,包括缺少标签、错误的类别分配、不准确的边界框、重复标签以及群体标注的一致性问题。这些问题不仅阻碍了模型训练,还降低了目标检测模型的可靠性和泛化能力。为了解决这些挑战,我们提出了一种全面的改进框架,并推出了 MJ-COCO,这是 MS-COCO 经重新标注的新版本。我们的方法从基于损失和梯度的错误检测开始,以识别可能被误标或难以学习的数据样本。接下来,我们应用了一个四阶段伪标签优化过程:(1) 使用可逆变换生成边界框;(2) 通过 IoU 去除重复标签并合并置信度;(3) 通过专家对象识别器验证类别一致性;以及 (4) 根据目标区域激活图进行空间调整。这一集成管道使得在没有人工重新标注的情况下大规模且准确地纠正注释错误成为可能。我们在四个验证数据集(MS-COCO、Sama COCO、Objects365 和 PASCAL VOC)上进行了广泛的实验,结果显示,在 MJ-COCO 上训练的模型始终优于 MS-COCO 训练的模型,并在平均精度 (AP) 和 APS 指标上实现了显著改进。MJ-COCO 在标注覆盖范围方面也取得了重大进展:例如,相较于 MS-COCO,小目标注释的数量增加了超过 200,000 条。

URL

https://arxiv.org/abs/2506.00997

PDF

https://arxiv.org/pdf/2506.00997.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot