Paper Reading AI Learner

COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes

2024-10-31 17:03:38
Muhammad Ali, Mamoona Javaid, Mubashir Noman, Mustansar Fiaz, Salman Khan

Abstract

Automated waste recycling aims to efficiently separate the recyclable objects from the waste by employing vision-based systems. However, the presence of varying shaped objects having different material types makes it a challenging problem, especially in cluttered environments. Existing segmentation methods perform reasonably on many semantic segmentation datasets by employing multi-contextual representations, however, their performance is degraded when utilized for waste object segmentation in cluttered scenarios. In addition, plastic objects further increase the complexity of the problem due to their translucent nature. To address these limitations, we introduce an efficacious segmentation network, named COSNet, that uses boundary cues along with multi-contextual information to accurately segment the objects in cluttered scenes. COSNet introduces novel components including feature sharpening block (FSB) and boundary enhancement module (BEM) for enhancing the features and highlighting the boundary information of irregular waste objects in cluttered environment. Extensive experiments on three challenging datasets including ZeroWaste-f, SpectralWaste, and ADE20K demonstrate the effectiveness of the proposed method. Our COSNet achieves a significant gain of 1.8% on ZeroWaste-f and 2.1% on SpectralWaste datasets respectively in terms of mIoU metric.

Abstract (translated)

自动化废物回收旨在通过采用基于视觉的系统,高效地从废物中分离可回收物品。然而,形状各异且材质不同的物体的存在使这一问题变得复杂,尤其是在混乱环境中。现有的分割方法在许多语义分割数据集上表现良好,主要依靠多情境表示,但在用于处理混乱场景中的废弃物对象分割时性能会下降。此外,由于塑料的半透明性质,塑料物品进一步增加了该问题的复杂性。为了解决这些问题,我们提出了一种高效的分割网络COSNet,它利用边界线索和多情境信息来准确地在混乱环境中对物体进行分割。COSNet引入了新颖组件,包括特征锐化块(FSB)和边界增强模块(BEM),用以增强特征并突出混乱环境下的不规则废弃物对象的边界信息。在三个具有挑战性的数据集ZeroWaste-f、SpectralWaste和ADE20K上的广泛实验表明了所提方法的有效性。我们的COSNet在mIoU指标下,分别在ZeroWaste-f和SpectralWaste数据集中取得了1.8%和2.1%的显著提升。

URL

https://arxiv.org/abs/2410.24139

PDF

https://arxiv.org/pdf/2410.24139.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot