A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation

2022-08-08 16:11:26

Linh The Nguyen, Nguyen Luong Tran, Long Doan, Manh Luong, Dat Quoc Nguyen

arXiv_CL

arXiv_CL Knowledge Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

In this paper, we introduce a high-quality and large-scale benchmark dataset for English-Vietnamese speech translation with 508 audio hours, consisting of 331K triplets of (sentence-lengthed audio, English source transcript sentence, Vietnamese target subtitle sentence). We also conduct empirical experiments using strong baselines and find that the traditional "Cascaded" approach still outperforms the modern "End-to-End" approach. To the best of our knowledge, this is the first large-scale English-Vietnamese speech translation study. We hope both our publicly available dataset and study can serve as a starting point for future research and applications on English-Vietnamese speech translation. Our dataset is available at this https URL

Abstract (translated)

URL

https://arxiv.org/abs/2208.04243

PDF

https://arxiv.org/pdf/2208.04243.pdf