End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2

2024-01-11 04:26:21

Aniket Tathe, Anand Kamble, Suyash Kumbharkar, Atharva Bhandare, Anirban C. Mitra

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

Speech has long been a barrier to effective communication and connection, persisting as a challenge in our increasingly interconnected world. This research paper introduces a transformative solution to this persistent obstacle an end-to-end speech conversion framework tailored for Hindi-to-English translation, culminating in the synthesis of English audio. By integrating cutting-edge technologies such as XLSR Wav2Vec2 for automatic speech recognition (ASR), mBART for neural machine translation (NMT), and a Text-to-Speech (TTS) synthesis component, this framework offers a unified and seamless approach to cross-lingual communication. We delve into the intricate details of each component, elucidating their individual contributions and exploring the synergies that enable a fluid transition from spoken Hindi to synthesized English audio.

Abstract (translated)

演讲一直是有效沟通和连接的障碍，作为一个持续的挑战，在我们的越来越相互连接的世界中。这篇研究论文提出了一种解决这个持续障碍的变革性解决方案——端到端印度语到英语翻译框架，最终合成英语音频。通过整合诸如XLSR Wav2Vec2自动语音识别（ASR）这样的尖端技术，mBART神经机器翻译（NMT）以及一个文本转语音（TTS）合成组件，这个框架为跨语言交流提供了一个统一且无缝的方法。我们深入研究每个组件的复杂细节，阐明它们各自的贡献，并探讨了使流利切换从 spoken Hindi到合成英语音频的协同作用。

URL

https://arxiv.org/abs/2401.06183

PDF

https://arxiv.org/pdf/2401.06183.pdf

End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2

Abstract

Abstract (translated)

URL

PDF Copy

PDF