MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

2022-11-16 03:03:56

Dominik Macháček, Ondřej Bojar, Raj Dabre

arXiv_AI

arXiv_AI Relation Speech

Abstract
Abstract (translated)
URL
PDF

Abstract

There have been several studies on the correlation between human ratings and metrics such as BLEU, chrF2 and COMET in machine translation. Most, if not all consider full-sentence translation. It is unclear whether human ratings of simultaneous speech translation Continuous Rating (CR) correlate with these metrics or not. Therefore, we conduct an extensive correlation analysis of CR and the aforementioned automatic metrics on evaluations of candidate systems at English-German simultaneous speech translation task at IWSLT 2022. Our studies reveal that the offline MT metrics correlate with CR and can be reliably used for evaluating machine translation in the simultaneous mode, with some limitations on the test set size. This implies that automatic metrics can be used as proxies for CR, thereby alleviating the need for human evaluation.

Abstract (translated)

URL

https://arxiv.org/abs/2211.08633

PDF

https://arxiv.org/pdf/2211.08633.pdf