Výsledky vyhledávání - "JOLY, Arnaud"

Report

Controllable Emphasis with zero data for text-to-speech

Autor: Joly, Arnaud, Nicolis, Marco, Peterova, Ekaterina, Lombardi, Alessandro, Abbas, Ammar, van Korlaar, Arent, Hussain, Aman, Sharma, Parul, Moinet, Alexis, Lajszczak, Mateusz, Karanasou, Penny, Bonafonte, Antonio, Drugman, Thomas, Sokolova, Elena

We present a scalable method to produce high quality emphasis for text-to-speech (TTS) that does not require recordings or annotations. Many TTS models include a phoneme duration model. A simple but effective method to achieve emphasized speech consi

Externí odkaz: http://arxiv.org/abs/2307.07062

Zobrazit plný text záznamu

Report

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Autor: Makarov, Peter, Abbas, Ammar, Łajszczak, Mateusz, Joly, Arnaud, Karlapati, Sri, Moinet, Alexis, Drugman, Thomas, Karanasou, Penny

Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based F

Externí odkaz: http://arxiv.org/abs/2206.14643

Zobrazit plný text záznamu

Report

Distribution augmentation for low-resource expressive text-to-speech

Autor: Lajszczak, Mateusz, Prasad, Animesh, van Korlaar, Arent, Bollepalli, Bajibabu, Bonafonte, Antonio, Joly, Arnaud, Nicolis, Marco, Moinet, Alexis, Drugman, Thomas, Wood, Trevor, Sokolova, Elena

This paper presents a novel data augmentation technique for text-to-speech (TTS), that allows to generate new (text, audio) training examples without requiring any additional data. Our goal is to increase diversity of text conditionings available dur

Externí odkaz: http://arxiv.org/abs/2202.06409

Zobrazit plný text záznamu

Report

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

Autor: Abbas, Ammar, Bollepalli, Bajibabu, Moinet, Alexis, Joly, Arnaud, Karanasou, Penny, Makarov, Peter, Slangens, Simon, Karlapati, Sri, Drugman, Thomas

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody. We present a generic multi-scale spectrogram prediction mechanism where the system first predicts coarser scale

Externí odkaz: http://arxiv.org/abs/2106.15649

Zobrazit plný text záznamu

Report

A learned conditional prior for the VAE acoustic space of a TTS system

Autor: Karanasou, Penny, Karlapati, Sri, Moinet, Alexis, Joly, Arnaud, Abbas, Ammar, Slangen, Simon, Trueba, Jaime Lorenzo, Drugman, Thomas

Many factors influence speech yielding different renditions of a given sentence. Generative models, such as variational autoencoders (VAEs), capture this variability and allow multiple renditions of the same sentence via sampling. The degree of proso

Externí odkaz: http://arxiv.org/abs/2106.10229

Zobrazit plný text záznamu

Report

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

Autor: Karlapati, Sri, Abbas, Ammar, Hodari, Zack, Moinet, Alexis, Joly, Arnaud, Karanasou, Penny, Drugman, Thomas

In this paper, we introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis with contextually appropriate prosody. In Stage I, we learn a prosodic distribution at the sentence level from mel-spectrograms a

Externí odkaz: http://arxiv.org/abs/2011.02252

Zobrazit plný text záznamu

Report

CAMP: a Two-Stage Approach to Modelling Prosody in Context

Autor: Hodari, Zack, Moinet, Alexis, Karlapati, Sri, Lorenzo-Trueba, Jaime, Merritt, Thomas, Joly, Arnaud, Abbas, Ammar, Karanasou, Penny, Drugman, Thomas

Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic s

Externí odkaz: http://arxiv.org/abs/2011.01175

Zobrazit plný text záznamu

Report

CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

Autor: Karlapati, Sri, Moinet, Alexis, Joly, Arnaud, Klimkov, Viacheslav, Sáez-Trigueros, Daniel, Drugman, Thomas

Publikováno v: INTERSPEECH 2020: 4387-4391

Prosody Transfer (PT) is a technique that aims to use the prosody from a source audio as a reference while synthesising speech. Fine-grained PT aims at capturing prosodic aspects like rhythm, emphasis, melody, duration, and loudness, from a source au

Externí odkaz: http://arxiv.org/abs/2004.14617

Zobrazit plný text záznamu

Report

Gradient tree boosting with random output projections for multi-label classification and multi-output regression

Autor: Joly, Arnaud, Wehenkel, Louis, Geurts, Pierre

In many applications of supervised learning, multiple classification or regression outputs have to be predicted jointly. We consider several extensions of gradient boosting to address such problems. We first propose a straightforward adaptation of gr

Externí odkaz: http://arxiv.org/abs/1905.07558

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání