Paper Reading AI Learner

High-Quality Medical Image Generation from Free-hand Sketch

2024-02-01 05:51:03
Quan Huu Cap, Atsushi Fukuda

Abstract

Generating medical images from human-drawn free-hand sketches holds promise for various important medical imaging applications. Due to the extreme difficulty in collecting free-hand sketch data in the medical domain, most deep learning-based methods have been proposed to generate medical images from the synthesized sketches (e.g., edge maps or contours of segmentation masks from real images). However, these models often fail to generalize on the free-hand sketches, leading to unsatisfactory results. In this paper, we propose a practical free-hand sketch-to-image generation model called Sketch2MedI that learns to represent sketches in StyleGAN's latent space and generate medical images from it. Thanks to the ability to encode sketches into this meaningful representation space, Sketch2MedI only requires synthesized sketches for training, enabling a cost-effective learning process. Our Sketch2MedI demonstrates a robust generalization to free-hand sketches, resulting in high-quality and realistic medical image generations. Comparative evaluations of Sketch2MedI against the pix2pix, CycleGAN, UNIT, and U-GAT-IT models show superior performance in generating pharyngeal images, both quantitative and qualitative across various metrics.

Abstract (translated)

基于手绘的自由手绘图生成医学图像具有各种重要医学成像应用的潜力。由于在医疗领域中收集自由手绘数据非常困难,因此提出了许多基于深度学习的生成医学图像的方法,这些方法通常从合成图(例如,真实图像中的边缘图或分割掩码的轮廓)生成医学图像。然而,这些模型通常无法泛化到自由手绘图,导致不满意的结果。在本文中,我们提出了一个实用的自由手绘图到图像生成模型,称为Sketch2MedI,它学会了在StyleGAN的潜在空间中表示手绘图,并从中生成医学图像。由于能够将手绘图编码到这个有意义的表示空间中,Sketch2MedI仅需要合成图进行训练,从而实现了一种经济有效的学习过程。我们的Sketch2MedI在自由手绘图中表现出优异的泛化能力,从而能够生成高质量和真实的医学图像。Sketch2MedI与pix2pix、CycleGAN、UNIT和U-GAT-IT模型的比较评估显示,在生成咽喉图像方面,Sketch2MedI在量和质上均优于这些模型。

URL

https://arxiv.org/abs/2402.00353

PDF

https://arxiv.org/pdf/2402.00353.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot