Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Autor:	Yasuda, Yusuke, Toda, Tomoki
Rok vydání:	2022
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Computation and Language
Zdroj:	IEEE Journal of Selected Topics in Signal Processing (Volume: 16, Issue: 6, October 2022)
Druh dokumentu:	Working Paper
DOI:	10.1109/JSTSP.2022.3190672
Popis:	End-to-end text-to-speech synthesis (TTS) can generate highly natural synthetic speech from raw text. However, rendering the correct pitch accents is still a challenging problem for end-to-end TTS. To tackle the challenge of rendering correct pitch accent in Japanese end-to-end TTS, we adopt PnG~BERT, a self-supervised pretrained model in the character and phoneme domain for TTS. We investigate the effects of features captured by PnG~BERT on Japanese TTS by modifying the fine-tuning condition to determine the conditions helpful inferring pitch accents. We manipulate content of PnG~BERT features from being text-oriented to speech-oriented by changing the number of fine-tuned layers during TTS. In addition, we teach PnG~BERT pitch accent information by fine-tuning with tone prediction as an additional downstream task. Our experimental results show that the features of PnG~BERT captured by pretraining contain information helpful inferring pitch accent, and PnG~BERT outperforms baseline Tacotron on accent correctness in a listening test.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2212.08321 Zobrazit plný text záznamu View this record from Arxiv