Paper Reading AI Learner

RadRotator: 3D Rotation of Radiographs with Diffusion Models

2024-04-19 16:55:12
Pouria Rouzrokh, Bardia Khosravi, Shahriar Faghani, Kellen L. Mulford, Michael J. Taunton, Bradley J. Erickson, Cody C. Wyles

Abstract

Transforming two-dimensional (2D) images into three-dimensional (3D) volumes is a well-known yet challenging problem for the computer vision community. In the medical domain, a few previous studies attempted to convert two or more input radiographs into computed tomography (CT) volumes. Following their effort, we introduce a diffusion model-based technology that can rotate the anatomical content of any input radiograph in 3D space, potentially enabling the visualization of the entire anatomical content of the radiograph from any viewpoint in 3D. Similar to previous studies, we used CT volumes to create Digitally Reconstructed Radiographs (DRRs) as the training data for our model. However, we addressed two significant limitations encountered in previous studies: 1. We utilized conditional diffusion models with classifier-free guidance instead of Generative Adversarial Networks (GANs) to achieve higher mode coverage and improved output image quality, with the only trade-off being slower inference time, which is often less critical in medical applications; and 2. We demonstrated that the unreliable output of style transfer deep learning (DL) models, such as Cycle-GAN, to transfer the style of actual radiographs to DRRs could be replaced with a simple yet effective training transformation that randomly changes the pixel intensity histograms of the input and ground-truth imaging data during training. This transformation makes the diffusion model agnostic to any distribution variations of the input data pixel intensity, enabling the reliable training of a DL model on input DRRs and applying the exact same model to conventional radiographs (or DRRs) during inference.

Abstract (translated)

将二维(2D)图像转换为三维(3D)体积在计算机视觉领域是一个众所周知但具有挑战性的问题。在医学领域,之前的一些研究表明,将两张或多张输入X光片转换为计算机断层扫描(CT)体积是可能的。他们的努力之后,我们引入了一种基于扩散模型的技术,该技术可以旋转任何输入X光片在3D空间中的解剖内容,从而有可能从任何角度观察到整个X光片的解剖内容。与之前的研究类似,我们使用CT体积来创建数字重建X光片(DRRs)作为模型的训练数据。然而,我们在之前的研究中遇到了两个显著的局限性:1.我们使用条件扩散模型(无分类指导)而不是生成对抗网络(GANs)来实现更高的模态覆盖和改善的输出图像质量,唯一的代价是推理时间更快,这在医疗应用中并不关键;2.我们证明了将深度学习模型(如循环神经网络)的风格迁移到DRRs的不确定输出可以被简单而有效的训练转换所取代,该转换在训练过程中随机改变输入和真实成像数据的像素强度直方图。这种转换使扩散模型对输入数据的像素强度分布变化具有免疫力,从而能够可靠地对DL模型在DRRs上的训练以及对常规X光片(或DRRs)的应用进行相同的模型。

URL

https://arxiv.org/abs/2404.13000

PDF

https://arxiv.org/pdf/2404.13000.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot