Abstract
Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech recognition (ASR), mBART for neural machine translation (NMT), and a Text-to-Speech (TTS) synthesis component, this framework offers a unified and seamless approach to cross-lingual communication. We delve into the intricate details of each component, elucidating their individual contributions and exploring the synergies that enable a fluid transition from spoken Hindi to synthesized English audio.
Abstract (translated)
演讲一直是有效沟通和连接的障碍,作为一个持续的挑战,在我们的越来越相互连接的世界中。这篇研究论文提出了一种解决这个持续障碍的变革性解决方案——端到端印度语到英语翻译框架,最终合成英语音频。通过整合诸如XLSR Wav2Vec2自动语音识别(ASR)这样的尖端技术,mBART神经机器翻译(NMT)以及一个文本转语音(TTS)合成组件,这个框架为跨语言交流提供了一个统一且无缝的方法。我们深入研究每个组件的复杂细节,阐明它们各自的贡献,并探讨了使流利切换从 spoken Hindi到合成英语音频的协同作用。
URL
https://arxiv.org/abs/2401.06183