Paper Reading AI Learner

Boosting Semantic Segmentation with Semantic Boundaries

2023-04-19 05:53:54
Haruya Ishikawa, Yoshimitsu Aoki

Abstract

In this paper, we present the Semantic Boundary Conditioned Backbone (SBCB) framework, a simple yet effective training framework that is model-agnostic and boosts segmentation performance, especially around the boundaries. Motivated by the recent development in improving semantic segmentation by incorporating boundaries as auxiliary tasks, we propose a multi-task framework that uses semantic boundary detection (SBD) as an auxiliary task. The SBCB framework utilizes the nature of the SBD task, which is complementary to semantic segmentation, to improve the backbone of the segmentation head. We apply an SBD head that exploits the multi-scale features from the backbone, where the model learns low-level features in the earlier stages, and high-level semantic understanding in the later stages. This head perfectly complements the common semantic segmentation architectures where the features from the later stages are used for classification. We can improve semantic segmentation models without additional parameters during inference by only conditioning the backbone. Through extensive evaluations, we show the effectiveness of the SBCB framework by improving various popular segmentation heads and backbones by 0.5% ~ 3.0% IoU on the Cityscapes dataset and gains 1.6% ~ 4.1% in boundary Fscores. We also apply this framework on customized backbones and the emerging vision transformer models and show the effectiveness of the SBCB framework.

Abstract (translated)

在本文中,我们介绍了语义边界条件基线(SBCB)框架,这是一个简单但有效的训练框架,模型无关,并可以提高分割性能,特别是在边界附近。出于最近发展的动力,将边界作为辅助任务来改进语义分割,我们提出了一个多任务框架,使用语义边界检测(SBD)作为辅助任务。SBCB框架利用SBD任务的性质,它是语义分割的互补任务,来改善分割头的结构。我们应用SBD头,利用基线中的多尺度特征,在早期的阶段,模型学习低级别的特征,在后期的阶段,模型学习高水平的语义理解。这个头完美补充了常见的语义分割架构,其中后期特征用于分类。通过广泛的评估,我们展示了SBCB框架的有效性,在Cityscapes数据集上,通过提高各种流行的分割头和基线的性能,0.5%至3.0%的IOU,并获得了边界Fscores的1.6%至4.1%。我们还将SBCB框架应用于定制的基线和新兴的视觉Transformer模型,并展示了SBCB框架的有效性。

URL

https://arxiv.org/abs/2304.09427

PDF

https://arxiv.org/pdf/2304.09427.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot