Paper Reading AI Learner

Conformal Prediction Sets for Instance Segmentation

2026-02-10 18:15:06
Kerri Lu, Dan M. Kluger, Stephen Bates, Sherrie Wang

Abstract

Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guarantee that a predicted mask is close to the ground truth. To address this limitation, we introduce a conformal prediction algorithm to generate adaptive confidence sets for instance segmentation. Given an image and a pixel coordinate query, our algorithm generates a confidence set of instance predictions for that pixel, with a provable guarantee for the probability that at least one of the predictions has high Intersection-Over-Union (IoU) with the true object instance mask. We apply our algorithm to instance segmentation examples in agricultural field delineation, cell segmentation, and vehicle detection. Empirically, we find that our prediction sets vary in size based on query difficulty and attain the target coverage, outperforming existing baselines such as Learn Then Test, Conformal Risk Control, and morphological dilation-based methods. We provide versions of the algorithm with asymptotic and finite sample guarantees.

Abstract (translated)

当前的实例分割模型在平均预测性能方面表现出色,但缺乏原则性的不确定性量化:其输出未进行校准,并且无法保证预测掩码接近真实目标。为解决这一局限性,我们引入了一种符合预测算法,用于生成实例分割的自适应置信集。给定一幅图像和一个像素坐标查询,我们的算法会为此像素生成一组包含实例预测结果的置信集,并提供了一个可证明的概率保证:至少有一个预测具有较高的交并比(IoU)与真实目标实例掩码相匹配。 我们将该算法应用于农业领域边界划分、细胞分割以及车辆检测中的实例分割示例。实验证明,我们的预测集合根据查询难度的变化而变化,并能达到预定的覆盖率,优于现有的基准方法,如“Learn Then Test”、“Conformal Risk Control”和基于形态学膨胀的方法。 我们提供了该算法的不同版本,既有渐近保证也有有限样本保证。

URL

https://arxiv.org/abs/2602.10045

PDF

https://arxiv.org/pdf/2602.10045.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot