A Textless Metric for Speech-to-Speech Comparison

2022-10-21 09:28:54

Laurent Besacier, Swen Ribeiro, Olivier Galibert, Ioan Calapodescu

arXiv_CL

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper proposes a textless speech-to-speech comparison metric that allows comparing a speech hypothesis with a speech reference without falling-back to their text transcripts. We leverage recently proposed speech2unit encoders (such as HuBERT) to pseudo-transcribe the speech utterances into discrete acoustic units and propose a simple neural architecture that learns a speech-based metric which correlates well with its text-based counterpart. Such a textless metric could ultimately be interesting for speech-to-speech translation evaluation (for oral languages or languages with no reliable ASR system available).

Abstract (translated)

URL

https://arxiv.org/abs/2210.11835

PDF

https://arxiv.org/pdf/2210.11835.pdf