Abstract
While deep learning has advanced speech enhancement (SE), effective phase modeling remains challenging, as conventional networks typically operate within a flat Euclidean feature space, which is not easy to model the underlying circular topology of the phase. To address this, we propose a manifold-aware magnitude-phase dual-stream framework that aligns the phase stream with its intrinsic circular geometry by enforcing Global Rotation Equivariance (GRE) characteristic. Specifically, we introduce a Magnitude-Phase Interactive Convolutional Module (MPICM) for modulus-based information exchange and a Hybrid-Attention Dual-FFN (HADF) bottleneck for unified feature fusion, both of which are designed to preserve GRE in the phase stream. Comprehensive evaluations are conducted across phase retrieval, denoising, dereverberation, and bandwidth extension tasks to validate the superiority of the proposed method over multiple advanced baselines. Notably, the proposed architecture reduces Phase Distance by over 20\% in the phase retrieval task and improves PESQ by more than 0.1 in zero-shot cross-corpus denoising evaluations. The overall superiority is also established in universal SE tasks involving mixed distortions. Qualitative analysis further reveals that the learned phase features exhibit distinct periodic patterns, which are consistent with the intrinsic circular nature of the phase. The source code is available at this https URL.
Abstract (translated)
尽管深度学习在语音增强(SE)方面取得了进展,但有效的相位建模仍然具有挑战性。传统网络通常在平坦的欧氏特征空间中操作,这难以模拟相位的基本环形拓扑结构。为了解决这个问题,我们提出了一种流形感知的幅度-相位双通道框架,通过强制执行全局旋转等变(GRE)特性来使相位通道与其固有的圆形几何形状对齐。具体而言,我们引入了基于模量的信息交换幅度-相位交互卷积模块(MPICM)和用于统一特征融合的混合注意力双FFN(HADF)瓶颈,两者都旨在在相位流中保持GRE。 为了验证所提出方法相对于多个高级基线模型的优势,我们在相位检索、降噪、去混响以及带宽扩展任务上进行了全面评估。值得注意的是,在相位检索任务中,我们的架构将相位距离降低了超过20%,并且在零样本跨语料库降噪评估中,PESQ提高了超过0.1分。在涉及混合失真的通用SE任务中也建立了整体优势。 定性分析进一步揭示了学习到的相位特征表现出明显的周期性模式,这与相位的基本环形本质一致。源代码可在此处获取(请将此处替换为实际链接)。
URL
https://arxiv.org/abs/2602.08556