Paper Reading AI Learner

MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy

2024-04-03 18:40:48
John J. Han, Ayberk Acar, Nicholas Kavoussi, Jie Ying Wu

Abstract

Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy. Rendering realistic endoscopic videos by traversing pre-operative scans (such as MRI or CT) can generate realistic simulations as well as ground truth camera poses and depth maps. Although image-to-image (I2I) translation models such as CycleGAN perform well, they are unsuitable for video-to-video synthesis due to the lack of temporal consistency, resulting in artifacts between frames. We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos with differentiable rendering. MeshBrush uses the underlying geometry of patient imaging data while leveraging existing I2I methods. With learned per-vertex textures, the stylized mesh guarantees consistency while producing high-fidelity outputs. We demonstrate that mesh stylization is a promising approach for creating realistic simulations for downstream tasks such as training and preoperative planning. Although our method is tested and designed for ureteroscopy, its components are transferable to general endoscopic and laparoscopic procedures.

Abstract (translated)

风格迁移是一种有效的解决医疗内窥镜模拟与现实差距的方法。通过通过术前扫描(如MRI或CT)生成逼真的内窥镜视频以及真实相机姿态和深度图,可以生成逼真的模拟。尽管图像到图像(I2I)变换模型如CycleGAN表现良好,但由于缺乏时间一致性,导致帧间出现伪影。我们提出MeshBrush,一种基于神经网络的网格纹理化方法,用于合成具有不同程度渲染的逼真的视频。MeshBrush利用患者成像数据的底层几何,同时利用现有的I2I方法。通过学习每个顶点的纹理,纹理化的网格确保了一致性,并产生了高保真的输出。我们证明了网格纹理化是一种有前途的方法,可用于为下游任务(如培训和术前计划)创建逼真的模拟。尽管我们的方法已针对内窥镜超声检查进行了测试和设计,但它的组件可应用于其他内窥镜和腹腔镜手术。

URL

https://arxiv.org/abs/2404.02999

PDF

https://arxiv.org/pdf/2404.02999.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot