Paper Reading AI Learner

The revenge of BiSeNet: Efficient Multi-Task Image Segmentation

2024-04-15 08:32:18
Gabriele Rosi, Claudia Cuttano, Niccolò Cavagnero, Giuseppe Averta, Fabio Cermelli

Abstract

Recent advancements in image segmentation have focused on enhancing the efficiency of the models to meet the demands of real-time applications, especially on edge devices. However, existing research has primarily concentrated on single-task settings, especially on semantic segmentation, leading to redundant efforts and specialized architectures for different tasks. To address this limitation, we propose a novel architecture for efficient multi-task image segmentation, capable of handling various segmentation tasks without sacrificing efficiency or accuracy. We introduce BiSeNetFormer, that leverages the efficiency of two-stream semantic segmentation architectures and it extends them into a mask classification framework. Our approach maintains the efficient spatial and context paths to capture detailed and semantic information, respectively, while leveraging an efficient transformed-based segmentation head that computes the binary masks and class probabilities. By seamlessly supporting multiple tasks, namely semantic and panoptic segmentation, BiSeNetFormer offers a versatile solution for multi-task segmentation. We evaluate our approach on popular datasets, Cityscapes and ADE20K, demonstrating impressive inference speeds while maintaining competitive accuracy compared to state-of-the-art architectures. Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks, bridging the gap between model efficiency and task adaptability.

Abstract (translated)

近年来,在图像分割领域的进步主要集中在提高模型的实时应用需求,尤其是边缘设备。然而,现有的研究主要集中在单任务设置,尤其是在语义分割上,导致对不同任务的冗余努力和专业架构。为了克服这一局限,我们提出了一个名为BiSeNetFormer的新型多任务图像分割架构,能够处理各种分割任务,同时不牺牲效率或准确性。我们引入了BiSeNetFormer,它利用了两流语义分割架构的效率,并将其扩展到掩码分类框架。我们的方法保持了捕捉详细和语义信息的高效空间和上下文路径,同时利用了高效的可转换基分割头计算二进制掩码和类概率。通过轻松支持多个任务,包括语义和视网膜分割,BiSeNetFormer为多任务分割提供了一个通用的解决方案。我们在流行的数据集(城市风光和ADE20K)上评估我们的方法,证明了我们令人印象深刻的推理速度,同时保持与最先进架构的竞争准确性。我们的结果表明,BiSeNetFormer在快速、高效和多任务分割网络方面取得了显著的进展,缩小了模型效率和任务适应性的差距。

URL

https://arxiv.org/abs/2404.09570

PDF

https://arxiv.org/pdf/2404.09570.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot