Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Łajszczak, Mateusz"'
Autor:
Łajszczak, Mateusz, Cámbara, Guillermo, Li, Yang, Beyhan, Fatih, van Korlaar, Arent, Yang, Fan, Joly, Arnaud, Martín-Cortinas, Álvaro, Abbas, Ammar, Michalski, Adam, Moinet, Alexis, Karlapati, Sri, Muszyńska, Ewa, Guo, Haohan, Putrycz, Bartosz, Gambino, Soledad López, Yoo, Kayeon, Sokolova, Elena, Drugman, Thomas
We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public do
Externí odkaz:
http://arxiv.org/abs/2402.08093
Autor:
Martín-Cortinas, Álvaro, Sáez-Trigueros, Daniel, Vallés-Pérez, Iván, Tura-Vecino, Biel, Biliński, Piotr, Lajszczak, Mateusz, Beringer, Grzegorz, Barra-Chicote, Roberto, Lorenzo-Trueba, Jaime
Large Language Models (LLMs) are one of the most promising technologies for the next era of speech generation systems, due to their scalability and in-context learning capabilities. Nevertheless, they suffer from multiple stability issues at inferenc
Externí odkaz:
http://arxiv.org/abs/2402.03407
Autor:
Joly, Arnaud, Nicolis, Marco, Peterova, Ekaterina, Lombardi, Alessandro, Abbas, Ammar, van Korlaar, Arent, Hussain, Aman, Sharma, Parul, Moinet, Alexis, Lajszczak, Mateusz, Karanasou, Penny, Bonafonte, Antonio, Drugman, Thomas, Sokolova, Elena
We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consi
Externí odkaz:
http://arxiv.org/abs/2307.07062
Autor:
Makarov, Peter, Abbas, Ammar, Łajszczak, Mateusz, Joly, Arnaud, Karlapati, Sri, Moinet, Alexis, Drugman, Thomas, Karanasou, Penny
Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based F
Externí odkaz:
http://arxiv.org/abs/2206.14643
Autor:
Karlapati, Sri, Karanasou, Penny, Lajszczak, Mateusz, Abbas, Ammar, Moinet, Alexis, Makarov, Peter, Li, Ray, van Korlaar, Arent, Slangen, Simon, Drugman, Thomas
In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level
Externí odkaz:
http://arxiv.org/abs/2206.13443
Autor:
Lajszczak, Mateusz, Prasad, Animesh, van Korlaar, Arent, Bollepalli, Bajibabu, Bonafonte, Antonio, Joly, Arnaud, Nicolis, Marco, Moinet, Alexis, Drugman, Thomas, Wood, Trevor, Sokolova, Elena
This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available dur
Externí odkaz:
http://arxiv.org/abs/2202.06409
We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) archit
Externí odkaz:
http://arxiv.org/abs/2110.12539
Autor:
Korzekwa, Daniel, Barra-Chicote, Roberto, Kostek, Bozena, Drugman, Thomas, Lajszczak, Mateusz
This paper proposed a novel approach for the detection and reconstruction of dysarthric speech. The encoder-decoder model factorizes speech into a low-dimensional latent space and encoding of the input text. We showed that the latent space conveys in
Externí odkaz:
http://arxiv.org/abs/1907.04743
Autor:
Prateek, Nishant, Łajszczak, Mateusz, Barra-Chicote, Roberto, Drugman, Thomas, Lorenzo-Trueba, Jaime, Merritt, Thomas, Ronanki, Srikanth, Wood, Trevor
Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data. This makes creating models for multiple styles expensive and time-consuming. In t
Externí odkaz:
http://arxiv.org/abs/1904.02790