Výsledky vyhledávání - "Korzekwa, Daniel"

Report

Autor: Mosiński, Jakub, Biliński, Piotr, Merritt, Thomas, Ezzerg, Abdelhamid, Korzekwa, Daniel

Recently normalizing flows have been gaining traction in text-to-speech (TTS) and voice conversion (VC) due to their state-of-the-art (SOTA) performance. Normalizing flows are unsupervised generative models. In this paper, we introduce supervision to

Externí odkaz: http://arxiv.org/abs/2312.16552

Zobrazit plný text záznamu

Report

Creating New Voices using Normalizing Flows

Autor: Bilinski, Piotr, Merritt, Thomas, Ezzerg, Abdelhamid, Pokora, Kamil, Cygert, Sebastian, Yanagisawa, Kayoko, Barra-Chicote, Roberto, Korzekwa, Daniel

Publikováno v: Interspeech 2022, 2958-2962

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in

Externí odkaz: http://arxiv.org/abs/2312.14569

Zobrazit plný text záznamu

Report

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

Autor: Zhang, Guangyan, Merritt, Thomas, Ribeiro, Manuel Sam, Tura-Vecino, Biel, Yanagisawa, Kayoko, Pokora, Kamil, Ezzerg, Abdelhamid, Cygert, Sebastian, Abbas, Ammar, Bilinski, Piotr, Barra-Chicote, Roberto, Korzekwa, Daniel, Lorenzo-Trueba, Jaime

Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently

Externí odkaz: http://arxiv.org/abs/2307.16679

Zobrazit plný text záznamu

Report

On granularity of prosodic representations in expressive text-to-speech

Autor: Babianski, Mikolaj, Pokora, Kamil, Shah, Raahil, Sienkiewicz, Rafal, Korzekwa, Daniel, Klimkov, Viacheslav

Publikováno v: 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 892-899

In expressive speech synthesis it is widely adopted to use latent prosody representations to deal with variability of the data during training. Same text may correspond to various acoustic realizations, which is known as a one-to-many mapping problem

Externí odkaz: http://arxiv.org/abs/2301.11446

Zobrazit plný text záznamu

Report

Remap, warp and attend: Non-parallel many-to-many accent conversion with Normalizing Flows

Autor: Ezzerg, Abdelhamid, Merritt, Thomas, Yanagisawa, Kayoko, Bilinski, Piotr, Proszewska, Magdalena, Pokora, Kamil, Korzeniowski, Renard, Barra-Chicote, Roberto, Korzekwa, Daniel

Regional accents of the same language affect not only how words are pronounced (i.e., phonetic content), but also impact prosodic aspects of speech such as speaking rate and intonation. This paper investigates a novel flow-based approach to accent co

Externí odkaz: http://arxiv.org/abs/2211.05850

Zobrazit plný text záznamu

Report

Automated detection of pronunciation errors in non-native English speech employing deep learning

Autor: Korzekwa, Daniel

Despite significant advances in recent years, the existing Computer-Assisted Pronunciation Training (CAPT) methods detect pronunciation errors with a relatively low accuracy (precision of 60% at 40%-80% recall). This Ph.D. work proposes novel deep le

Externí odkaz: http://arxiv.org/abs/2209.06265

Zobrazit plný text záznamu

Report

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Autor: Korzekwa, Daniel, Lorenzo-Trueba, Jaime, Drugman, Thomas, Kostek, Bozena

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the

Externí odkaz: http://arxiv.org/abs/2207.00774

Zobrazit plný text záznamu

Report

Text-free non-parallel many-to-many voice conversion using normalising flows

Autor: Merritt, Thomas, Ezzerg, Abdelhamid, Biliński, Piotr, Proszewska, Magdalena, Pokora, Kamil, Barra-Chicote, Roberto, Korzekwa, Daniel

Non-parallel voice conversion (VC) is typically achieved using lossy representations of the source speech. However, ensuring only speaker identity information is dropped whilst all other information from the source speech is retained is a large chall

Externí odkaz: http://arxiv.org/abs/2203.08009

Zobrazit plný text záznamu

Report

Enhancing audio quality for expressive Neural Text-to-Speech

Autor: Ezzerg, Abdelhamid, Gabrys, Adam, Putrycz, Bartosz, Korzekwa, Daniel, Saez-Trigueros, Daniel, McHardy, David, Pokora, Kamil, Lachowicz, Jakub, Lorenzo-Trueba, Jaime, Klimkov, Viacheslav

Artificial speech synthesis has made a great leap in terms of naturalness as recent Text-to-Speech (TTS) systems are capable of producing speech with similar quality to human recordings. However, not all speaking styles are easy to model: highly expr

Externí odkaz: http://arxiv.org/abs/2108.06270

Zobrazit plný text záznamu

Report

Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech

Autor: Shah, Raahil, Pokora, Kamil, Ezzerg, Abdelhamid, Klimkov, Viacheslav, Huybrechts, Goeric, Putrycz, Bartosz, Korzekwa, Daniel, Merritt, Thomas

Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly r

Externí odkaz: http://arxiv.org/abs/2106.12896

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání