Paper Reading AI Learner

PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices

2024-10-29 07:45:59
Ming Kang, Fung Fung Ting, Rapha\"el C. -W. Phan, Chee-Ming Ting

Abstract

Brain tumor detection in multiplane Magnetic Resonance Imaging (MRI) slices is a challenging task due to the various appearances and relationships in the structure of the multiplane images. In this paper, we propose a new You Only Look Once (YOLO)-based detection model that incorporates Pretrained Knowledge (PK), called PK-YOLO, to improve the performance for brain tumor detection in multiplane MRI slices. To our best knowledge, PK-YOLO is the first pretrained knowledge guided YOLO-based object detector. The main components of the new method are a pretrained pure lightweight convolutional neural network-based backbone via sparse masked modeling, a YOLO architecture with the pretrained backbone, and a regression loss function for improving small object detection. The pretrained backbone allows for feature transferability of object queries on individual plane MRI slices into the model encoders, and the learned domain knowledge base can improve in-domain detection. The improved loss function can further boost detection performance on small-size brain tumors in multiplanar two-dimensional MRI slices. Experimental results show that the proposed PK-YOLO achieves competitive performance on the multiplanar MRI brain tumor detection datasets compared to state-of-the-art YOLO-like and DETR-like object detectors. The code is available at this https URL.

Abstract (translated)

脑肿瘤在多平面磁共振成像(MRI)切片中的检测是一项挑战性任务,因为多平面图像的结构具有各种外观和关系。本文提出了一种新的基于“仅看一次”(YOLO) 的检测模型,该模型结合了预训练知识(PK),称为PK-YOLO,以提高在多平面MRI切片中脑肿瘤检测的表现。据我们所知,PK-YOLO是首个由预训练知识引导的YOLO基础目标检测器。新方法的主要组成部分包括通过稀疏掩模建模预先训练的纯轻量级卷积神经网络主干、带有预训练主干的YOLO架构以及用于改善小物体检测的回归损失函数。预训练的主干允许将单个平面MRI切片中的对象查询特征传递到模型编码器中,而学到的领域知识库可以提升域内检测效果。改进后的损失函数能够进一步提高多平面二维MRI切片中小尺寸脑肿瘤的检测性能。实验结果显示,所提出的PK-YOLO在多平面MRI脑肿瘤检测数据集上的表现与最先进的YOLO类和DETR类目标检测器相比具有竞争力。代码可在以下链接获取:[此https URL]。

URL

https://arxiv.org/abs/2410.21822

PDF

https://arxiv.org/pdf/2410.21822.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot