Paper Reading AI Learner

Panoptic Segmentation

2019-04-10 18:17:40
Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollár

Abstract

We propose and study a task we name panoptic segmentation (PS). Panoptic segmentation unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). The proposed task requires generating a coherent scene segmentation that is rich and complete, an important step toward real-world vision systems. While early work in computer vision addressed related image/scene parsing tasks, these are not currently popular, possibly due to lack of appropriate metrics or associated recognition challenges. To address this, we propose a novel panoptic quality (PQ) metric that captures performance for all classes (stuff and things) in an interpretable and unified manner. Using the proposed metric, we perform a rigorous study of both human and machine performance for PS on three existing datasets, revealing interesting insights about the task. The aim of our work is to revive the interest of the community in a more unified view of image segmentation.

Abstract (translated)

我们提出并研究了一个叫做泛光分割(PS)的任务。泛光分割统一了语义分割(为每个像素分配一个类标签)和实例分割(检测和分割每个对象实例)这两个典型的不同任务。提出的任务要求生成一个丰富而完整的连贯场景分割,这是向现实视觉系统迈出的重要一步。虽然计算机视觉的早期工作解决了相关的图像/场景分析任务,但这些任务目前并不流行,可能是由于缺乏适当的度量标准或相关的识别挑战。为了解决这一问题,我们提出了一种新的泛光质量(PQ)度量,它以一种可解释和统一的方式捕获所有类(事物和事物)的性能。利用所提出的度量标准,我们对现有的三个数据集上的ps的人和机器性能进行了严格的研究,揭示了有关该任务的有趣见解。我们的工作的目的是在一个更统一的图像分割的观点中恢复社区的兴趣。

URL

https://arxiv.org/abs/1801.00868

PDF

https://arxiv.org/pdf/1801.00868.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot