Paper Reading AI Learner

JPPF: Multi-task Fusion for Consistent Panoptic-Part Segmentation

2023-11-30 15:17:46
Shishir Muralidhara, Sravan Kumar Jagadeesh, René Schuster, Didier Stricker

Abstract

Part-aware panoptic segmentation is a problem of computer vision that aims to provide a semantic understanding of the scene at multiple levels of granularity. More precisely, semantic areas, object instances, and semantic parts are predicted simultaneously. In this paper, we present our Joint Panoptic Part Fusion (JPPF) that combines the three individual segmentations effectively to obtain a panoptic-part segmentation. Two aspects are of utmost importance for this: First, a unified model for the three problems is desired that allows for mutually improved and consistent representation learning. Second, balancing the combination so that it gives equal importance to all individual results during fusion. Our proposed JPPF is parameter-free and dynamically balances its input. The method is evaluated and compared on the Cityscapes Panoptic Parts (CPP) and Pascal Panoptic Parts (PPP) datasets in terms of PartPQ and Part-Whole Quality (PWQ). In extensive experiments, we verify the importance of our fair fusion, highlight its most significant impact for areas that can be further segmented into parts, and demonstrate the generalization capabilities of our design without fine-tuning on 5 additional datasets.

Abstract (translated)

部分感知透视分割是一个计算机视觉问题,旨在在多级粒度上提供对场景的语义理解。更具体地说,同时预测语义区域、目标实例和语义部分。在本文中,我们提出了我们的联合透视部分融合(JPPF),有效地将三个分割结合在一起以获得透视部分分割。其中两个方面对于这个问题尤为重要:第一,希望有一个统一的模型,允许在学习和理解三个问题方面相互改进和一致的表示学习。第二,平衡组合以使它在融合过程中给每个单独结果同等重要性。我们提出的JPPF是参数free的,并且动态地平衡其输入。该方法在Cityscapes透视部分(CPP)和Pascal透视部分(PPP)数据集上进行了评估和比较,以考虑PartPQ和Part-Whole Quality(PWQ)。在广泛的实验中,我们验证了我们对公平融合的重视,强调了其在可以进一步细分的领域中的最显著影响,并展示了我们设计的通用性的能力,无需在5个额外的数据集上进行微调。

URL

https://arxiv.org/abs/2311.18618

PDF

https://arxiv.org/pdf/2311.18618.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot