Abstract
This study investigates whether phonological features can be applied in text-to-speech systems to generate native and non-native speech. We present a mapping between ARPABET/pinyin->SAMPA/SAMPA-SC->phonological features in this paper, and tested whether native, non-native, and code-switched speech could be successfully generated using this mapping. We ran two experiments, one with a small dataset and one with a larger dataset. The results proved that phonological features can be a feasible input system, although it needs further investigation to improve model performance. The accented output generated by the TTS models also helps with understanding human second language acquisition processes.
Abstract (translated)
URL
https://arxiv.org/abs/2110.03609