Abstract
The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.
Abstract (translated)
视觉变换器(ViT)的出现已经在3D体积基准方面取得了显著进展,特别是在3D医学图像分割方面。同时,多核神经网络(MLP)网络重新获得了研究人员的青睐,尽管它们的结果与ViT相当,但排除了重定向器(self-attention module)。本文介绍了一种可转换的混合网络,名为PHNet,它利用卷积神经网络(CNN)和MLP的优点。PHNet解决了3D体积数据的内在同向性问题,通过利用2D和3DCNN提取局部信息。此外,我们提出了一种高效的多核重定向器模块,名为MLPP,通过获得长距离依赖而保留位置信息,增强了原始的MLP。广泛的实验结果证实,PHNet在两个公共数据集上优于最先进的方法,即COVID-19-20和SYNapse。此外, ablation研究证明了PHNet在利用CNN和MLP的优点方面的有效性。代码将在接受后向公众开放。
URL
https://arxiv.org/abs/2303.13111