Paper Reading AI Learner

WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

2024-04-15 07:53:07
Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Hongjie Zheng

Abstract

Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer Networks with the Unet architecture. This architecture enhances image detail by integrating feature maps from the encoder and decoder via skip connections. However, current methods often overlook enhancements to the Unet architecture itself, focusing instead on optimizing encoder and decoder structures. This approach can be problematic due to the significant differences in feature map characteristics between the encoder and decoder, where simple fusion strategies may not effectively reconstruct this http URL this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathways instead of traditional skip connections to improve feature integration. WiTUnet also incorporates a windowed Transformer structure to process images in smaller, non-overlapping segments, reducing computational load. Additionally, the integration of a Local Image Perception Enhancement (LiPe) module in both the encoder and decoder replaces the standard multi-layer perceptron (MLP) in Transformers, enhancing local feature capture and representation. Through extensive experimental comparisons, WiTUnet has demonstrated superior performance over existing methods in key metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE), significantly improving noise removal and image quality.

Abstract (translated)

低剂量CT(LDCT)已成为诊断医学成像的首选技术,尽管其与标准CT相比辐射剂量较低,但图像噪声增加,可能会影响诊断准确性。为解决这个问题,已经开发了高级基于深度学习的LDCT去噪算法,主要使用卷积神经网络(CNN)或具有Unet架构的Transformer网络。这种架构通过级联密集跳过路径将编码器和解码器的特征图进行集成,从而增强图像细节。然而,目前的方法通常忽视了Unet架构本身的增长,而是专注于优化编码器和解码器结构。这种方法由于编码器和解码器之间特征图特征的显著差异而具有问题,简单的融合策略可能无法有效地重构本文中的这个URL。我们引入了WiTUnet,一种新颖的LDCT图像去噪方法,它利用嵌套的密集跳过路径而不是传统的跳过连接来提高特征集成。WiTUnet还引入了一个窗口化的Transformer结构来处理更小的、非重叠的图像段,降低计算负载。此外,在编码器和解码器中引入了Local Image Perception Enhancement(LiPe)模块,用换位图像感知增强代替了Transformer中的标准多层感知器,增强了局部特征捕捉和表示。通过广泛的实验比较,WiTUnet在关键指标如峰值信号-噪声比(PSNR)、结构相似性(SSIM)和根均方误差(RMSE)方面已经表现出优越的性能,显著提高了去噪效果和图像质量。

URL

https://arxiv.org/abs/2404.09533

PDF

https://arxiv.org/pdf/2404.09533.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot