Paper Reading AI Learner

orGAN: A Synthetic Data Augmentation Pipeline for Simultaneous Generation of Surgical Images and Ground Truth Labels

2025-06-17 08:29:40
Niran Nataraj, Maina Sogabe, Kenji Kawashima

Abstract

Deep learning in medical imaging faces obstacles: limited data diversity, ethical issues, high acquisition costs, and the need for precise annotations. Bleeding detection and localization during surgery is especially challenging due to the scarcity of high-quality datasets that reflect real surgical scenarios. We propose orGAN, a GAN-based system for generating high-fidelity, annotated surgical images of bleeding. By leveraging small "mimicking organ" datasets, synthetic models that replicate tissue properties and bleeding, our approach reduces ethical concerns and data-collection costs. orGAN builds on StyleGAN with Relational Positional Learning to simulate bleeding events realistically and mark bleeding coordinates. A LaMa-based inpainting module then restores clean, pre-bleed visuals, enabling precise pixel-level annotations. In evaluations, a balanced dataset of orGAN and mimicking-organ images achieved 90% detection accuracy in surgical settings and up to 99% frame-level accuracy. While our development data lack diverse organ morphologies and contain intraoperative artifacts, orGAN markedly advances ethical, efficient, and cost-effective creation of realistic annotated bleeding datasets, supporting broader integration of AI in surgical practice.

Abstract (translated)

深度学习在医学影像领域面临诸多挑战,包括数据多样性有限、伦理问题、高昂的数据获取成本以及精确标注的需求。尤其是在手术过程中检测和定位出血方面,由于高质量数据集稀缺且难以反映真实的外科场景,这一任务尤为艰巨。 我们提出了一种基于生成对抗网络(GAN)的系统——orGAN,用于生成高保真度并带有注释标签的手术出血图像。通过利用小型“模拟器官”数据集,并使用合成模型复制组织特性和出血情况,我们的方法减少了伦理担忧和数据收集成本。orGAN 建立在 StyleGAN 和关系位置学习基础上,能够逼真地模拟出血事件并标记出血坐标。随后,orGAN 使用 LaMa(Large Mask)为基础的图像修复模块恢复干净、无出血的原始视图,从而实现精确的像素级注释。 评估结果显示,在手术场景中,一个平衡混合了 orGAN 生成数据和“模拟器官”真实数据的数据集达到了90%的检测准确率,并在帧级别上实现了高达99%的准确度。尽管我们的开发数据缺乏多样化的器官形态并且包含术中的图像伪影,但orGAN 显著推进了伦理、高效且低成本地创建逼真的标注出血数据集的方法。这种方法支持人工智能技术更广泛应用于外科实践中。

URL

https://arxiv.org/abs/2506.14303

PDF

https://arxiv.org/pdf/2506.14303.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot