Paper Reading AI Learner

MiSuRe is all you need to explain your image segmentation

2024-06-18 00:45:54
Syed Nouman Hasany, Fabrice M\'eriaudeau, Caroline Petitjean


The last decade of computer vision has been dominated by Deep Learning architectures, thanks to their unparalleled success. Their performance, however, often comes at the cost of explainability owing to their highly non-linear nature. Consequently, a parallel field of eXplainable Artificial Intelligence (XAI) has developed with the aim of generating insights regarding the decision making process of deep learning models. An important problem in XAI is that of the generation of saliency maps. These are regions in an input image which contributed most towards the model's final decision. Most work in this regard, however, has been focused on image classification, and image segmentation - despite being a ubiquitous task - has not received the same attention. In the present work, we propose MiSuRe (Minimally Sufficient Region) as an algorithm to generate saliency maps for image segmentation. The goal of the saliency maps generated by MiSuRe is to get rid of irrelevant regions, and only highlight those regions in the input image which are crucial to the image segmentation decision. We perform our analysis on 3 datasets: Triangle (artificially constructed), COCO-2017 (natural images), and the Synapse multi-organ (medical images). Additionally, we identify a potential usecase of these post-hoc saliency maps in order to perform post-hoc reliability of the segmentation model.

Abstract (translated)

过去十年的计算机视觉领域,深度学习架构占据了主导地位,得益于它们无与伦比的的成功。然而,由于它们高度非线性的特点,其性能往往代价高昂。因此,一个旨在生成关于深度学习模型决策过程的洞察的并行领域——可解释性人工智能(XAI)应运而生。XAI的一个关键问题就是生成 saliency 地图。这些是输入图像中对于模型最终决策起最大贡献的 regions。然而,在这方面,大部分工作都集中在图像分类和图像分割——尽管这是一项普遍存在的任务,但并未得到同样的关注。在本文中,我们提出了 MiSuRe(最小充分区域)算法,用于为图像分割生成 saliency 地图。MiSuRe 生成的 saliency 地图的目标是消除无关区域,仅关注输入图像中关键用于分割决策的区域。我们对三个数据集进行了分析:Triangle(人工构造数据集),COCO-2017(自然图像数据集)和 Synapse multi-organ(医学图像数据集)。此外,我们还发现了这些后置 saliency 地图的一个潜在应用,以便对分割模型的后置可靠性进行评估。



3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot