A Non-Autoregressivee Network for Chinese Text to Speech and Voice Cloning

Autor: Yueqing Cai, Wenbi Rao, Chunkang Zhang
Rok vydání: 2021
Předmět:
Zdroj: 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA).
DOI: 10.1109/icaica52286.2021.9497934
Popis: Text to speech (TTS) has been evolving rapidly these years. Researchers have successfully converted English text into speech which sounds like natural speaker, proposing numerous models from RNN to non-autoregressive network. However, the migration of these models to Chinese TTS is still an issue because of its prosodic phrasing problems and large character set, not to mention the disappointing outcomes of those successfully-migrated models, most of which are autoregressive. In this paper, we successfully migrate FastSpeech2 to the field of Chinese TTS with generative adversarial network (GAN) as its discriminator for training to enhance the outcome. Postnet of Tactron2 is also applied to fine-tune the mel-spectrogram. We also use x-vector-based voiceprint extraction model to extract voiceprint to achieve voice cloning. The experiment is operated on both models which offers results of 3.83 mean opinion score (MOS) in terms of naturalness and 3.82 MOS in terms of similarity.
Databáze: OpenAIRE