Paper Reading AI Learner

Efficient Domain Adaptation for Endoscopic Visual Odometry

2024-03-16 08:57:00
Junyang Wu, Yun Gu, Guang-Zhong Yang

Abstract

Visual odometry plays a crucial role in endoscopic imaging, yet the scarcity of realistic images with ground truth poses poses a significant challenge. Therefore, domain adaptation offers a promising approach to bridge the pre-operative planning domain with the intra-operative real domain for learning odometry information. However, existing methodologies suffer from inefficiencies in the training time. In this work, an efficient neural style transfer framework for endoscopic visual odometry is proposed, which compresses the time from pre-operative planning to testing phase to less than five minutes. For efficient traing, this work focuses on training modules with only a limited number of real images and we exploit pre-operative prior information to dramatically reduce training duration. Moreover, during the testing phase, we propose a novel Test Time Adaptation (TTA) method to mitigate the gap in lighting conditions between training and testing datasets. Experimental evaluations conducted on two public endoscope datasets showcase that our method achieves state-of-the-art accuracy in visual odometry tasks while boasting the fastest training speeds. These results demonstrate significant promise for intra-operative surgery applications.

Abstract (translated)

视觉姿态测量在内窥镜成像中扮演着关键角色,然而缺乏真实感图像是显著的挑战。因此,领域迁移是一个有前途的方法,可以将术前规划域与内窥镜实况域之间建立联系,以学习姿态信息。然而,现有的方法在训练时间上存在低效性。在这项工作中,我们提出了一个高效的内窥镜视觉姿态迁移框架,将术前规划阶段到测试阶段的所需时间缩短至不到五分钟。为了实现高效的训练,这项工作专注于训练仅包含有限数量真实图像的模块,并利用术前先验信息显著缩短训练时间。此外,在测试阶段,我们提出了一种名为Test Time Adaptation(TTA)的新方法,以弥合训练和测试数据之间的光线条件差异。对两个公开的内窥镜数据集进行的实验评估表明,我们的方法在视觉姿态测量任务上实现了最先进的准确度,同时具有最快的训练速度。这些结果表明,我们的方法在体内手术应用领域具有巨大的潜力。

URL

https://arxiv.org/abs/2403.10860

PDF

https://arxiv.org/pdf/2403.10860.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot