Paper Reading AI Learner

A Permutable Hybrid Network for Volumetric Medical Image Segmentation

2023-03-23 08:59:09
Yi Lin, Xiao Fang, Dong Zhang, Kwang-Ting Cheng, Hao Chen

Abstract

The advent of Vision Transformer (ViT) has brought substantial advancements in 3D volumetric benchmarks, particularly in 3D medical image segmentation. Concurrently, Multi-Layer Perceptron (MLP) networks have regained popularity among researchers due to their comparable results to ViT, albeit with the exclusion of the heavy self-attention module. This paper introduces a permutable hybrid network for volumetric medical image segmentation, named PHNet, which exploits the advantages of convolution neural network (CNN) and MLP. PHNet addresses the intrinsic isotropy problem of 3D volumetric data by utilizing both 2D and 3D CNN to extract local information. Besides, we propose an efficient Multi-Layer Permute Perceptron module, named MLPP, which enhances the original MLP by obtaining long-range dependence while retaining positional information. Extensive experimental results validate that PHNet outperforms the state-of-the-art methods on two public datasets, namely, COVID-19-20 and Synapse. Moreover, the ablation study demonstrates the effectiveness of PHNet in harnessing the strengths of both CNN and MLP. The code will be accessible to the public upon acceptance.

Abstract (translated)

视觉变换器(ViT)的出现已经在3D体积基准方面取得了显著进展,特别是在3D医学图像分割方面。同时,多核神经网络(MLP)网络重新获得了研究人员的青睐,尽管它们的结果与ViT相当,但排除了重定向器(self-attention module)。本文介绍了一种可转换的混合网络,名为PHNet,它利用卷积神经网络(CNN)和MLP的优点。PHNet解决了3D体积数据的内在同向性问题,通过利用2D和3DCNN提取局部信息。此外,我们提出了一种高效的多核重定向器模块,名为MLPP,通过获得长距离依赖而保留位置信息,增强了原始的MLP。广泛的实验结果证实,PHNet在两个公共数据集上优于最先进的方法,即COVID-19-20和SYNapse。此外, ablation研究证明了PHNet在利用CNN和MLP的优点方面的有效性。代码将在接受后向公众开放。

URL

https://arxiv.org/abs/2303.13111

PDF

https://arxiv.org/pdf/2303.13111.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot