Paper Reading AI Learner

Practical Insights into Semi-Supervised Object Detection Approaches

2026-01-19 20:31:15
Chaoxin Wang, Bharaneeshwar Balasubramaniyam, Anurag Sangem, Nicolais Guevara, Doina Caragea

Abstract

Learning in data-scarce settings has recently gained significant attention in the research community. Semi-supervised object detection(SSOD) aims to improve detection performance by leveraging a large number of unlabeled images alongside a limited number of labeled images(a.k.a.,few-shot learning). In this paper, we present a comprehensive comparison of three state-of-the-art SSOD approaches, including MixPL, Semi-DETR and Consistent-Teacher, with the goal of understanding how performance varies with the number of labeled images. We conduct experiments using the MS-COCO and Pascal VOC datasets, two popular object detection benchmarks which allow for standardized evaluation. In addition, we evaluate the SSOD approaches on a custom Beetle dataset which enables us to gain insights into their performance on specialized datasets with a smaller number of object categories. Our findings highlight the trade-offs between accuracy, model size, and latency, providing insights into which methods are best suited for low-data regimes.

Abstract (translated)

在数据稀缺环境下进行学习最近引起了研究社区的广泛关注。半监督目标检测(SSOD)旨在通过利用大量未标记图像和少量已标注图像(即,少样本学习)来提高检测性能。在这篇论文中,我们对三种最先进的SSOD方法进行了全面比较,包括MixPL、Semi-DETR 和 Consistent-Teacher,并且我们的研究目的是了解这些方法在不同数量的标签图片下的表现差异。我们在实验中使用了两个流行的目标检测基准数据集——MS-COCO和Pascal VOC,以实现标准化评估。此外,我们还在一个自定义的甲壳虫(Beetle)数据集上对SSOD方法进行了评估,该数据集使我们可以了解这些方法在具有较少对象类别的专业化数据集上的性能表现。我们的研究结果强调了准确率、模型大小和延迟之间的权衡,并提供了关于哪些方法最适合低数据环境的见解。

URL

https://arxiv.org/abs/2601.13380

PDF

https://arxiv.org/pdf/2601.13380.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot