Abstract
Visual odometry plays a crucial role in endoscopic imaging, yet the scarcity of realistic images with ground truth poses poses a significant challenge. Therefore, domain adaptation offers a promising approach to bridge the pre-operative planning domain with the intra-operative real domain for learning odometry information. However, existing methodologies suffer from inefficiencies in the training time. In this work, an efficient neural style transfer framework for endoscopic visual odometry is proposed, which compresses the time from pre-operative planning to testing phase to less than five minutes. For efficient traing, this work focuses on training modules with only a limited number of real images and we exploit pre-operative prior information to dramatically reduce training duration. Moreover, during the testing phase, we propose a novel Test Time Adaptation (TTA) method to mitigate the gap in lighting conditions between training and testing datasets. Experimental evaluations conducted on two public endoscope datasets showcase that our method achieves state-of-the-art accuracy in visual odometry tasks while boasting the fastest training speeds. These results demonstrate significant promise for intra-operative surgery applications.
Abstract (translated)
视觉姿态测量在内窥镜成像中扮演着关键角色,然而缺乏真实感图像是显著的挑战。因此,领域迁移是一个有前途的方法,可以将术前规划域与内窥镜实况域之间建立联系,以学习姿态信息。然而,现有的方法在训练时间上存在低效性。在这项工作中,我们提出了一个高效的内窥镜视觉姿态迁移框架,将术前规划阶段到测试阶段的所需时间缩短至不到五分钟。为了实现高效的训练,这项工作专注于训练仅包含有限数量真实图像的模块,并利用术前先验信息显著缩短训练时间。此外,在测试阶段,我们提出了一种名为Test Time Adaptation(TTA)的新方法,以弥合训练和测试数据之间的光线条件差异。对两个公开的内窥镜数据集进行的实验评估表明,我们的方法在视觉姿态测量任务上实现了最先进的准确度,同时具有最快的训练速度。这些结果表明,我们的方法在体内手术应用领域具有巨大的潜力。
URL
https://arxiv.org/abs/2403.10860