Cross-CBAM: A Lightweight network for Scene Segmentation

2023-06-04 09:03:05
Zhengbin Zhang, Zhenhao Xu, Xingsheng Gu, Juan Xiong


Scene parsing is a great challenge for real-time semantic segmentation. Although traditional semantic segmentation networks have made remarkable leap-forwards in semantic accuracy, the performance of inference speed is unsatisfactory. Meanwhile, this progress is achieved with fairly large networks and powerful computational resources. However, it is difficult to run extremely large models on edge computing devices with limited computing power, which poses a huge challenge to the real-time semantic segmentation tasks. In this paper, we present the Cross-CBAM network, a novel lightweight network for real-time semantic segmentation. Specifically, a Squeeze-and-Excitation Atrous Spatial Pyramid Pooling Module(SE-ASPP) is proposed to get variable field-of-view and multiscale information. And we propose a Cross Convolutional Block Attention Module(CCBAM), in which a cross-multiply operation is employed in the CCBAM module to make high-level semantic information guide low-level detail information. Different from previous work, these works use attention to focus on the desired information in the backbone. CCBAM uses cross-attention for feature fusion in the FPN structure. Extensive experiments on the Cityscapes dataset and Camvid dataset demonstrate the effectiveness of the proposed Cross-CBAM model by achieving a promising trade-off between segmentation accuracy and inference speed. On the Cityscapes test set, we achieve 73.4% mIoU with a speed of 240.9FPS and 77.2% mIoU with a speed of 88.6FPS on NVIDIA GTX 1080Ti.

Abstract (translated)

场景解析是实时语义分割面临的一个巨大的挑战。虽然传统的语义分割网络在语义准确性方面已经取得了显著的进展,但推理速度仍然不满意。与此同时,这种进展是通过相对较大的网络和强大的计算资源实现的。然而,在边缘计算设备上运行巨型模型具有有限的计算能力,这给实时语义分割任务带来了一个巨大的挑战。在本文中,我们提出了Cross-CBAM网络,这是一种全新的轻量级网络,用于实时语义分割。具体来说,我们提出了一种SE-ASPPSqueeze-and-Excitation Atrous Spatial Pyramid Pooling Module(缩放并刺激刺激空间Pyramid Pooling模块),以获取可变视角和多尺度信息。我们还提出了一个Cross Convolutional Block Attention Module(CCBAM),其中在CCBAM模块中采用了交叉乘法操作,以使高层次语义信息指导低层次的细节信息。与以前的工作不同,这些工作使用注意力来关注骨架中的想要的信息。CCBAM使用交叉注意力在FPN结构中进行特征融合。在Cityscapes测试集上,我们实现了73.4%的mIoU,以240.9FPS的速度在NVIDIA GTX 1080Ti上运行,并实现了77.2%的mIoU,以88.6FPS的速度在NVIDIA GTX 1080Ti上运行。



