Paper Reading AI Learner

Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans

2024-04-18 04:25:56
Lixing Tan, Shuang Song, Kangneng Zhou, Chengbo Duan, Lanying Wang, Huayang Ren, Linlin Liu, Wei Zhang, Ruoxiu Xiao

Abstract

X-ray images play a vital role in the intraoperative processes due to their high resolution and fast imaging speed and greatly promote the subsequent segmentation, registration and reconstruction. However, over-dosed X-rays superimpose potential risks to human health to some extent. Data-driven algorithms from volume scans to X-ray images are restricted by the scarcity of paired X-ray and volume data. Existing methods are mainly realized by modelling the whole X-ray imaging procedure. In this study, we propose a learning-based approach termed CT2X-GAN to synthesize the X-ray images in an end-to-end manner using the content and style disentanglement from three different image domains. Our method decouples the anatomical structure information from CT scans and style information from unpaired real X-ray images/ digital reconstructed radiography (DRR) images via a series of decoupling encoders. Additionally, we introduce a novel consistency regularization term to improve the stylistic resemblance between synthesized X-ray images and real X-ray images. Meanwhile, we also impose a supervised process by computing the similarity of computed real DRR and synthesized DRR images. We further develop a pose attention module to fully strengthen the comprehensive information in the decoupled content code from CT scans, facilitating high-quality multi-view image synthesis in the lower 2D space. Extensive experiments were conducted on the publicly available CTSpine1K dataset and achieved 97.8350, 0.0842 and 3.0938 in terms of FID, KID and defined user-scored X-ray similarity, respectively. In comparison with 3D-aware methods ($\pi$-GAN, EG3D), CT2X-GAN is superior in improving the synthesis quality and realistic to the real X-ray images.

Abstract (translated)

由于其高分辨率和高成像速度,X 射线图像在术中进程中有很高的价值。然而,过度曝光的 X 射线会带来一定的对人类健康的潜在风险。数据驱动的算法从体积扫描到 X 射线图像都受到稀疏的成对 X 射线和体积数据不足的限制。现有方法主要是通过建模整个 X 射线成像过程来实现。在这项研究中,我们提出了一种基于学习的称为 CT2X-GAN 的方法,用于端到端地合成三个不同图像域中的 X 射线图像,通过一系列解耦编码器实现解剖结构信息和风格信息之间的解耦。此外,我们还引入了一个新的一致性正则化项以提高合成 X 射线图像和真实 X 射线图像之间的风格相似度。同时,我们通过计算计算得到的真实 DRR 和合成 DRR 图像的相似度来实现监督过程。我们进一步开发了一个姿态注意模块,以增强从 CT 扫描中解耦得到的内容代码的全面信息,从而在较低的 2D 空间中实现高质量的多视角图像合成。我们对公开可用的 CTSpine1K 数据集进行了广泛的实验,分别实现了 97.8350、0.0842 和 3.0938 的 FID、KID 和用户评分的 X 射线相似度。与 3D 感知方法(π-GAN、EG3D)相比,CT2X-GAN 在提高合成质量和真实性方面具有优势。

URL

https://arxiv.org/abs/2404.11889

PDF

https://arxiv.org/pdf/2404.11889.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot