WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

Abstract
Abstract (translated)
URL
PDF

Abstract

Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer Networks with the Unet architecture. This architecture enhances image detail by integrating feature maps from the encoder and decoder via skip connections. However, current methods often overlook enhancements to the Unet architecture itself, focusing instead on optimizing encoder and decoder structures. This approach can be problematic due to the significant differences in feature map characteristics between the encoder and decoder, where simple fusion strategies may not effectively reconstruct this http URL this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathways instead of traditional skip connections to improve feature integration. WiTUnet also incorporates a windowed Transformer structure to process images in smaller, non-overlapping segments, reducing computational load. Additionally, the integration of a Local Image Perception Enhancement (LiPe) module in both the encoder and decoder replaces the standard multi-layer perceptron (MLP) in Transformers, enhancing local feature capture and representation. Through extensive experimental comparisons, WiTUnet has demonstrated superior performance over existing methods in key metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE), significantly improving noise removal and image quality.

Abstract (translated)

低剂量CT（LDCT）已成为诊断医学成像的首选技术，尽管其与标准CT相比辐射剂量较低，但图像噪声增加，可能会影响诊断准确性。为解决这个问题，已经开发了高级基于深度学习的LDCT去噪算法，主要使用卷积神经网络（CNN）或具有Unet架构的Transformer网络。这种架构通过级联密集跳过路径将编码器和解码器的特征图进行集成，从而增强图像细节。然而，目前的方法通常忽视了Unet架构本身的增长，而是专注于优化编码器和解码器结构。这种方法由于编码器和解码器之间特征图特征的显著差异而具有问题，简单的融合策略可能无法有效地重构本文中的这个URL。我们引入了WiTUnet，一种新颖的LDCT图像去噪方法，它利用嵌套的密集跳过路径而不是传统的跳过连接来提高特征集成。WiTUnet还引入了一个窗口化的Transformer结构来处理更小的、非重叠的图像段，降低计算负载。此外，在编码器和解码器中引入了Local Image Perception Enhancement（LiPe）模块，用换位图像感知增强代替了Transformer中的标准多层感知器，增强了局部特征捕捉和表示。通过广泛的实验比较，WiTUnet在关键指标如峰值信号-噪声比（PSNR）、结构相似性（SSIM）和根均方误差（RMSE）方面已经表现出优越的性能，显著提高了去噪效果和图像质量。

URL

https://arxiv.org/abs/2404.09533

PDF

https://arxiv.org/pdf/2404.09533.pdf

WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

Abstract

Abstract (translated)

URL

PDF Copy

PDF