Abstract
Due to the inherent difficulty in modeling phonetic similarities across different languages, code-switching speech recognition presents a formidable challenge. This study proposes a Collaborative-MoE, a Mixture of Experts (MoE) model that leverages a collaborative mechanism among expert groups. Initially, a preceding routing network explicitly learns Language Identification (LID) tasks and selects experts based on acquired LID weights. This process ensures robust routing information to the MoE layer, mitigating interference from diverse language domains on expert network parameter updates. The LID weights are also employed to facilitate inter-group collaboration, enabling the integration of language-specific representations. Furthermore, within each language expert group, a gating network operates unsupervised to foster collaboration on attributes beyond language. Extensive experiments demonstrate the efficacy of our approach, achieving significant performance enhancements compared to alternative methods. Importantly, our method preserves the efficient inference capabilities characteristic of MoE models without necessitating additional pre-training.
Abstract (translated)
由于在不同的语言之间建模音位相似性的固有困难,代码切换语音识别面临着巨大的挑战。本研究提出了一种合作-MoE模型,该模型利用专家群体之间的合作机制。首先,预先路由网络明确学习语言识别(LID)任务,并根据获得的LID权重选择专家。这一过程确保将路由信息传递给MoE层,减轻来自多样语言领域的专家网络参数更新的干扰。LID权重还将促进组间合作,实现语言特定的表示集成。此外,在每种语言专家组中,一个门网络 operates 无监督,以促进在属性超出语言范围的合作。大量实验证明了我们方法的效力,与 alternative 方法相比取得了显著的性能提升。重要的是,我们的方法保留了 MoE 模型的有效推理能力,而无需进行额外的预训练。
URL
https://arxiv.org/abs/2409.02050