The entire network structure of Crossmodal Transformer

2021-04-29 11:47:31

Meng Li, Changyan Lin, Lixia Shu, Xin Pu, Yi Chen, Heng Wu, Jiasong Li, Hongshuai Cao

arXiv_CV

arXiv_CV Relation Transformer Pose Medical 3D Matching

Abstract
Abstract (translated)
URL
PDF

Abstract

Since the mapping relationship between definitized intra-interventional 2D X-ray and undefined pre-interventional 3D Computed Tomography(CT) is uncertain, auxiliary positioning devices or body markers, such as medical implants, are commonly used to determine this relationship. However, such approaches can not be widely used in clinical due to the complex realities. To determine the mapping relationship, and achieve a initializtion post estimation of human body without auxiliary equipment or markers, a cross-modal matching transformer network is proposed to matching 2D X-ray and 3D CT images directly. The proposed approach first deep learns skeletal features from 2D X-ray and 3D CT images. The features are then converted into 1D X-ray and CT representation vectors, which are combined using a multi-modal transformer. As a result, the well-trained network can directly predict the spatial correspondence between arbitrary 2D X-ray and 3D CT. The experimental results show that when combining our approach with the conventional approach, the achieved accuracy and speed can meet the basic clinical intervention needs, and it provides a new direction for intra-interventional registration.

Abstract (translated)

URL

https://arxiv.org/abs/2104.14273

PDF

https://arxiv.org/pdf/2104.14273.pdf