Paper Reading AI Learner

Phi-SegNet: Phase-Integrated Supervision for Medical Image Segmentation

2026-01-22 16:00:41
Shams Nafisa Ali, Taufiq Hasan

Abstract

Deep learning has substantially advanced medical image segmentation, yet achieving robust generalization across diverse imaging modalities and anatomical structures remains a major challenge. A key contributor to this limitation lies in how existing architectures, ranging from CNNs to Transformers and their hybrids, primarily encode spatial information while overlooking frequency-domain representations that capture rich structural and textural cues. Although few recent studies have begun exploring spectral information at the feature level, supervision-level integration of frequency cues-crucial for fine-grained object localization-remains largely untapped. To this end, we propose Phi-SegNet, a CNN-based architecture that incorporates phase-aware information at both architectural and optimization levels. The network integrates Bi-Feature Mask Former (BFMF) modules that blend neighboring encoder features to reduce semantic gaps, and Reverse Fourier Attention (RFA) blocks that refine decoder outputs using phase-regularized features. A dedicated phase-aware loss aligns these features with structural priors, forming a closed feedback loop that emphasizes boundary precision. Evaluated on five public datasets spanning X-ray, US, histopathology, MRI, and colonoscopy, Phi-SegNet consistently achieved state-of-the-art performance, with an average relative improvement of 1.54+/-1.26% in IoU and 0.98+/-0.71% in F1-score over the next best-performing model. In cross-dataset generalization scenarios involving unseen datasets from the known domain, Phi-SegNet also exhibits robust and superior performance, highlighting its adaptability and modality-agnostic design. These findings demonstrate the potential of leveraging spectral priors in both feature representation and supervision, paving the way for generalized segmentation frameworks that excel in fine-grained object localization.

Abstract (translated)

深度学习在医学图像分割领域取得了显著进展,然而,在不同成像模态和解剖结构之间实现稳健的泛化仍然是一个重大挑战。现有架构(从CNN到Transformer及其混合体)主要编码空间信息,而忽视了捕捉丰富结构和纹理线索的频域表示,这是导致这一限制的关键因素之一。虽然最近有一些研究开始探索特征级别的光谱信息,但在监督级别上融合频率线索——这对于精细目标定位至关重要——仍然很大程度上未被开发。 为此,我们提出Phi-SegNet,这是一种基于CNN的架构,在体系结构和优化层面都整合了相位感知信息。该网络集成了Bi-Feature Mask Former(BFMF)模块,用于融合相邻编码器特征以减少语义差距,并使用相位正则化特征来精炼解码器输出的Reverse Fourier Attention(RFA)块。 通过专门设计的相位感知损失函数将这些特征与结构先验对齐,形成了一个闭环反馈机制,强调了边界的精确性。在涵盖X射线、超声波、组织病理学、MRI和结肠镜检查等领域的五个公开数据集上进行了评估,Phi-SegNet始终取得了最先进的性能,在平均相对改进方面,相较于下一个最佳模型,IoU提高了1.54±1.26%,F1得分提高了0.98±0.71%。 在涉及来自已知域但未经训练的数据集的跨数据集泛化场景中,Phi-SegNet也表现出稳健且优越的表现,彰显了其适应性和模态无关设计。这些发现表明,在特征表示和监督方面利用光谱先验具有潜力,并为实现卓越精细目标定位能力的通用分割框架铺平道路。

URL

https://arxiv.org/abs/2601.16064

PDF

https://arxiv.org/pdf/2601.16064.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot