Paper Reading AI Learner

Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction

2026-02-09 11:54:22
Chengzhong Wang, Andong Li, Dingding Yao, Junfeng Li

Abstract

While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a manifold-aware magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual-FFN (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at this https URL.

Abstract (translated)

尽管深度学习在语音增强(SE)方面取得了进展,但有效的相位建模仍然具有挑战性。传统网络通常在平坦的欧氏特征空间中操作,这难以模拟相位的基本环形拓扑结构。为了解决这个问题,我们提出了一种流形感知的幅度-相位双通道框架,通过强制执行全局旋转等变(GRE)特性来使相位通道与其固有的圆形几何形状对齐。具体而言,我们引入了基于模量的信息交换幅度-相位交互卷积模块(MPICM)和用于统一特征融合的混合注意力双FFN(HADF)瓶颈,两者都旨在在相位流中保持GRE。 为了验证所提出方法相对于多个高级基线模型的优势,我们在相位检索、降噪、去混响以及带宽扩展任务上进行了全面评估。值得注意的是,在相位检索任务中,我们的架构将相位距离降低了超过20%,并且在零样本跨语料库降噪评估中,PESQ提高了超过0.1分。在涉及混合失真的通用SE任务中也建立了整体优势。 定性分析进一步揭示了学习到的相位特征表现出明显的周期性模式,这与相位的基本环形本质一致。源代码可在此处获取(请将此处替换为实际链接)。

URL

https://arxiv.org/abs/2602.08556

PDF

https://arxiv.org/pdf/2602.08556.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot