Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

2022-09-21 03:34:22

Shijing Si, Jianzong Wang, Xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

arXiv_SD

Abstract
Abstract (translated)
URL
PDF

Abstract

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios. However, the training of these models usually poses a challenge due to their complicated adversarial network architectures. To address this, in this work we leverage the state-of-the-art contrastive learning techniques and incorporate an efficient Siamese network structure into the StarGAN discriminator. Our method is called SimSiam-StarGAN-VC and it boosts the training stability and effectively prevents the discriminator overfitting issue in the training process. We conduct experiments on the Voice Conversion Challenge (VCC 2018) dataset, plus a user study to validate the performance of our framework. Our experimental results show that SimSiam-StarGAN-VC significantly outperforms existing StarGAN-VC methods in terms of both the objective and subjective metrics.

Abstract (translated)

URL

https://arxiv.org/abs/2209.10088

PDF

https://arxiv.org/pdf/2209.10088.pdf