Zobrazeno 1 - 10
of 12
pro vyhledávání: '"Shirahata, Yuma"'
We propose a novel description-based controllable text-to-speech (TTS) method with cross-lingual control capability. To address the lack of audio-description paired data in the target language, we combine a TTS model trained on the target language wi
Externí odkaz:
http://arxiv.org/abs/2409.17452
We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions
Externí odkaz:
http://arxiv.org/abs/2406.12194
This paper proposes an audio-conditioned phonemic and prosodic annotation model for building text-to-speech (TTS) datasets from unlabeled speech samples. For creating a TTS dataset that consists of label-speech paired data, the proposed annotation mo
Externí odkaz:
http://arxiv.org/abs/2406.08111
We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations:
Externí odkaz:
http://arxiv.org/abs/2406.07969
Autor:
Shimizu, Reo, Yamamoto, Ryuichi, Kawamura, Masaya, Shirahata, Yuma, Doi, Hironori, Komatsu, Tatsuya, Tachibana, Kentaro
We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions. To control speaker identity within the prompt-based TTS framework, we introduce the concept of
Externí odkaz:
http://arxiv.org/abs/2309.08140
We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. Our model is based on VITS, a high-quality end-to-end text-to-speech model, but adopts two changes for more efficient infer
Externí odkaz:
http://arxiv.org/abs/2210.15975
Autor:
Shirahata, Yuma, Yamamoto, Ryuichi, Song, Eunwoo, Terashima, Ryo, Kim, Jae-Min, Tachibana, Kentaro
Several fully end-to-end text-to-speech (TTS) models have been proposed that have shown better performance compared to cascade models (i.e., training acoustic and vocoder models separately). However, they often generate unstable pitch contour with au
Externí odkaz:
http://arxiv.org/abs/2210.15964
Autor:
Terashima, Ryo, Yamamoto, Ryuichi, Song, Eunwoo, Shirahata, Yuma, Yoon, Hyun-Wook, Kim, Jae-Min, Tachibana, Kentaro
Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is chal
Externí odkaz:
http://arxiv.org/abs/2204.10020
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. Our model is based on VITS, a high-quality end-to-end text-to-speech model, but adopts two changes for more efficient infer
Autor:
Shirahata, Yuma, Yamamoto, Ryuichi, Song, Eunwoo, Terashima, Ryo, Kim, Jae-Min, Tachibana, Kentaro
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Several fully end-to-end text-to-speech (TTS) models have been proposed that have shown better performance compared to cascade models (i.e., training acoustic and vocoder models separately). However, they often generate unstable pitch contour with au