Zobrazeno 1 - 9
of 9
pro vyhledávání: '"RJ Skerry-Ryan"'
Autor:
Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao
This work explores the task of synthesizing speech in nonexistent human-sounding voices. We call this task "speaker generation", and present TacoSpawn, a system that performs competitively at this task. TacoSpawn is a recurrent attention-based text-t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f6e83b22f13a9e01602b62b69bb268cc
http://arxiv.org/abs/2111.05095
http://arxiv.org/abs/2111.05095
Publikováno v:
Interspeech 2021.
This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and
Publikováno v:
ICASSP
We describe a sequence-to-sequence neural network which directly generates speech waveforms from text inputs. The architecture extends the Tacotron model by incorporating a normalizing flow into the autoregressive decoder loop. Output waveforms are m
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9cdc85d56435b697621120f15b6834b4
Autor:
Ron Weiss, Andrew Rosenberg, RJ Skerry-Ryan, Yu Zhang, Heiga Zen, Bhuvana Ramabhadran, Zhifeng Chen, Yonghui Wu, Ye Jia
Publikováno v:
INTERSPEECH
We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8526be8b339a406a410e03a2ae3a3a2e
http://arxiv.org/abs/1907.04448
http://arxiv.org/abs/1907.04448
Publikováno v:
ICASSP
Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised traini
Autor:
Eric Battenberg, David T. H. Kao, Tom Bagby, Soroosh Mariooryad, RJ Skerry-Ryan, Daisy Stanton, Matt Shannon
Publikováno v:
ICASSP
Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be ad
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::44e108a9536d4f771709c9eb9d0e9391
Publikováno v:
ICASSP
Unitary Evolution Recurrent Neural Networks (uRNNs) have three attractive properties: (a) the unitary property, (b) the complex-valued nature, and (c) their efficient linear operators [1]. The literature so far does not address - how critical is the
Publikováno v:
SLT
Global Style Tokens (GSTs) are a recently-proposed method to learn latent disentangled representations of high-dimensional data. GSTs can be used within Tacotron, a state-of-the-art end-to-end text-to-speech synthesis system, to uncover expressive fa
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4e7991f19ae638161f4ce6e5f4f733b6
Autor:
Rif A. Saurous, RJ Skerry-Ryan, Quoc V. Le, Yonghui Wu, Navdeep Jaitly, Daisy Stanton, Ron Weiss, Robert A. J. Clark, Yuxuan Wang, Yannis Agiomyrgiannakis, Ying Xiao, Zongheng Yang, Zhifeng Chen, Samy Bengio
Publikováno v:
INTERSPEECH
A text-to-speech synthesis system typically consists of multiple stages, such as a text analysis frontend, an acoustic model and an audio synthesis module. Building these components often requires extensive domain expertise and may contain brittle de