Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Viacheslav Klimkov"'
Autor:
Latorre, Javier, Lachowicz, Jakub, Lorenzo-Trueba, Jaime, Merritt, Thomas, Drugman, Thomas, Ronanki, Srikanth, Viacheslav, Klimkov
Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings. However, these models require large amounts of data. This paper shows that the lack of da
Externí odkaz:
http://arxiv.org/abs/1811.06315
Autor:
Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov
In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training. Same text may correspond to various acoustic realizations, which is known as a one-to-many mapping problem
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::78737239f17a356ec56a8793c348b2d4
http://arxiv.org/abs/2301.11446
http://arxiv.org/abs/2301.11446
Autor:
Daniel Korzekwa, Goeric Huybrechts, Bartosz Putrycz, Viacheslav Klimkov, Kamil Pokora, Abdelhamid Ezzerg, Thomas Merritt, Raahil Shah
Publikováno v:
11th ISCA Speech Synthesis Workshop (SSW 11).
Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly r
Autor:
Viacheslav Klimkov, Marco Nicolis
Publikováno v:
11th ISCA Speech Synthesis Workshop (SSW 11).
Autor:
Adam Gabrys, Kamil Pokora, Daniel Saez-Trigueros, Viacheslav Klimkov, Jaime Lorenzo-Trueba, Jakub Lachowicz, Abdelhamid Ezzerg, Bartosz Putrycz, Daniel Korzekwa, David McHardy
Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings. However, not all speaking styles are easy to model: highly expr
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::0f3066dd01521b970701004da0d2159f
http://arxiv.org/abs/2108.06270
http://arxiv.org/abs/2108.06270
This paper proposes a general enhancement to the Normalizing Flows (NF) used in neural vocoding. As a case study, we improve expressive speech vocoding with a revamped Parallel Wavenet (PW). Specifically, we propose to extend the affine transformatio
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6e52d27c4aad7e6a1b774be4c49d908a
http://arxiv.org/abs/2106.08649
http://arxiv.org/abs/2106.08649
Autor:
Daniel Korzekwa, Yunlong Jiao, Adam Gabrys, Georgi Tinchev, Bartosz Putrycz, Viacheslav Klimkov
Publikováno v:
ICASSP
We present a universal neural vocoder based on Parallel WaveNet, with an additional conditioning network called Audio Encoder. Our universal vocoder offers real-time high-quality speech synthesis on a wide range of use cases. We tested it on 43 inter
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::56acc48ff12c39cbac4d85d8fc62c708
Autor:
Sri Karlapati, Viacheslav Klimkov, Daniel Saez-Trigueros, Alexis Moinet, Thomas Drugman, Arnaud Joly
Publikováno v:
INTERSPEECH
Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesising speech. Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source au
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8b74e95fa2eb409504348346e0562ff9
http://arxiv.org/abs/2004.14617
http://arxiv.org/abs/2004.14617
Pitch detection is a fundamental problem in speech processing as F0 is used in a large number of applications. Recent papers have proposed deep learning for robust pitch tracking. In this letter, we consider voicing detection as a classification prob
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b37b07a02e027338b3d3deea3283c578
http://arxiv.org/abs/1903.01290
http://arxiv.org/abs/1903.01290
Publikováno v:
INTERSPEECH
We present a neural text-to-speech system for fine-grained prosody transfer from one speaker to another. Conventional approaches for end-to-end prosody transfer typically use either fixed-dimensional or variable-length prosody embedding via a seconda
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::23a5de24ec69a253e40e2df5e8da1abe