Paper Reading AI Learner

Cross-CBAM: A Lightweight network for Scene Segmentation

2023-06-04 09:03:05
Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong

Abstract

Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation is employed in the CCBAM module to make high-level semantic information guide low-level detail information. Different from previous work, these works use attention to focus on the desired information in the backbone. CCBAM uses cross-attention for feature fusion in the FPN structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.

Abstract (translated)

场景解析是实时语义分割面临的一个巨大的挑战。虽然传统的语义分割网络在语义准确性方面已经取得了显著的进展,但推理速度仍然不满意。与此同时,这种进展是通过相对较大的网络和强大的计算资源实现的。然而,在边缘计算设备上运行巨型模型具有有限的计算能力,这给实时语义分割任务带来了一个巨大的挑战。在本文中,我们提出了Cross-CBAM网络,这是一种全新的轻量级网络,用于实时语义分割。具体来说,我们提出了一种SE-ASPPSqueeze-and-Excitation Atrous Spatial Pyramid Pooling Module(缩放并刺激刺激空间Pyramid Pooling模块),以获取可变视角和多尺度信息。我们还提出了一个Cross Convolutional Block Attention Module(CCBAM),其中在CCBAM模块中采用了交叉乘法操作,以使高层次语义信息指导低层次的细节信息。与以前的工作不同,这些工作使用注意力来关注骨架中的想要的信息。CCBAM使用交叉注意力在FPN结构中进行特征融合。在Cityscapes测试集上,我们实现了73.4%的mIoU,以240.9FPS的速度在NVIDIA GTX 1080Ti上运行,并实现了77.2%的mIoU,以88.6FPS的速度在NVIDIA GTX 1080Ti上运行。

URL

https://arxiv.org/abs/2306.02306

PDF

https://arxiv.org/pdf/2306.02306.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot