Výsledky vyhledávání - "Łajszczak, Mateusz"

Report

BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Autor: Łajszczak, Mateusz, Cámbara, Guillermo, Li, Yang, Beyhan, Fatih, van Korlaar, Arent, Yang, Fan, Joly, Arnaud, Martín-Cortinas, Álvaro, Abbas, Ammar, Michalski, Adam, Moinet, Alexis, Karlapati, Sri, Muszyńska, Ewa, Guo, Haohan, Putrycz, Bartosz, Gambino, Soledad López, Yoo, Kayeon, Sokolova, Elena, Drugman, Thomas

We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf{B}$ig $\textbf{A}$daptive $\textbf{S}$treamable TTS with $\textbf{E}$mergent abilities. BASE TTS is the largest TTS model to-date, trained on 100K hours of public do

Externí odkaz: http://arxiv.org/abs/2402.08093

Zobrazit plný text záznamu

Report

Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

Autor: Martín-Cortinas, Álvaro, Sáez-Trigueros, Daniel, Vallés-Pérez, Iván, Tura-Vecino, Biel, Biliński, Piotr, Lajszczak, Mateusz, Beringer, Grzegorz, Barra-Chicote, Roberto, Lorenzo-Trueba, Jaime

Large Language Models (LLMs) are one of the most promising technologies for the next era of speech generation systems, due to their scalability and in-context learning capabilities. Nevertheless, they suffer from multiple stability issues at inferenc

Externí odkaz: http://arxiv.org/abs/2402.03407

Zobrazit plný text záznamu

Report

Controllable Emphasis with zero data for text-to-speech

Autor: Joly, Arnaud, Nicolis, Marco, Peterova, Ekaterina, Lombardi, Alessandro, Abbas, Ammar, van Korlaar, Arent, Hussain, Aman, Sharma, Parul, Moinet, Alexis, Lajszczak, Mateusz, Karanasou, Penny, Bonafonte, Antonio, Drugman, Thomas, Sokolova, Elena

We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consi

Externí odkaz: http://arxiv.org/abs/2307.07062

Zobrazit plný text záznamu

Report

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Autor: Makarov, Peter, Abbas, Ammar, Łajszczak, Mateusz, Joly, Arnaud, Karlapati, Sri, Moinet, Alexis, Drugman, Thomas, Karanasou, Penny

Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based F

Externí odkaz: http://arxiv.org/abs/2206.14643

Zobrazit plný text záznamu

Report

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Autor: Karlapati, Sri, Karanasou, Penny, Lajszczak, Mateusz, Abbas, Ammar, Moinet, Alexis, Makarov, Peter, Li, Ray, van Korlaar, Arent, Slangen, Simon, Drugman, Thomas

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level

Externí odkaz: http://arxiv.org/abs/2206.13443

Zobrazit plný text záznamu

Report

Distribution augmentation for low-resource expressive text-to-speech

Autor: Lajszczak, Mateusz, Prasad, Animesh, van Korlaar, Arent, Bollepalli, Bajibabu, Bonafonte, Antonio, Joly, Arnaud, Nicolis, Marco, Moinet, Alexis, Drugman, Thomas, Wood, Trevor, Sokolova, Elena

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available dur

Externí odkaz: http://arxiv.org/abs/2202.06409

Zobrazit plný text záznamu

Report

Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

Autor: Strong, Marek, Rohnke, Jonas, Bonafonte, Antonio, Łajszczak, Mateusz, Wood, Trevor

We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) archit

Externí odkaz: http://arxiv.org/abs/2110.12539

Zobrazit plný text záznamu

Report

Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech

Autor: Korzekwa, Daniel, Barra-Chicote, Roberto, Kostek, Bozena, Drugman, Thomas, Lajszczak, Mateusz

This paper proposed a novel approach for the detection and reconstruction of dysarthric speech. The encoder-decoder model factorizes speech into a low-dimensional latent space and encoding of the input text. We showed that the latent space conveys in

Externí odkaz: http://arxiv.org/abs/1907.04743

Zobrazit plný text záznamu

Report

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

Autor: Prateek, Nishant, Łajszczak, Mateusz, Barra-Chicote, Roberto, Drugman, Thomas, Lorenzo-Trueba, Jaime, Merritt, Thomas, Ronanki, Srikanth, Wood, Trevor

Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data. This makes creating models for multiple styles expensive and time-consuming. In t

Externí odkaz: http://arxiv.org/abs/1904.02790

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání