Výsledky vyhledávání - "Thomas Drugman"

Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody

Autor: Peter Makarov, Syed Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou

Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based F

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::76854a79fc5858f735a05d6824f9eda7
http://arxiv.org/abs/2206.14643

Zobrazit plný text záznamu

Expressive, Variable, and Controllable Duration Modelling in TTS

Autor: Syed Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

Duration modelling has become an important research problem once more with the rise of non-attention neural text-to-speech systems. The current approaches largely fall back to relying on previous statistical parametric speech synthesis technology for

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::79880a0740607b602f22960ff55384bf
http://arxiv.org/abs/2206.14165

Zobrazit plný text záznamu

CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer

Autor: Sri Karlapati, Penny Karanasou, Mateusz Łajszczak, Syed Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::df16965a817345bcda4ceb2494b51158
http://arxiv.org/abs/2206.13443

Zobrazit plný text záznamu

Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need

Autor: Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8532d016275ed97d9ea5a1955be3a96d

Zobrazit plný text záznamu

EmoCat: Language-agnostic Emotional Voice Conversion

Autor: Bastian Schnell, Thomas Drugman, Jaime Lorenzo-Trueba, Bartek Perz, Goeric Huybrechts

Publikováno v: 11th ISCA Speech Synthesis Workshop (SSW 11).

Emotional voice conversion models adapt the emotion in speech without changing the speaker identity or linguistic content. They are less data hungry than text-to-speech models and allow to generate large amounts of emotional data for downstream tasks

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8a98d4fe495f7c6dccd0d32a0cef788d
https://doi.org/10.21437/ssw.2021-13

Zobrazit plný text záznamu

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

Autor: Bajibabu Bollepalli, Arnaud Joly, Penny Karanasou, Simon Slangen, Thomas Drugman, Ammar Abbas, Peter Makarov, Alexis Moinet, Sri Karlapati

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody. We present a generic multi-scale spectrogram prediction mechanism where the system first predicts coarser scale

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1b9ef65e9e72575811bb74c80c567052
http://arxiv.org/abs/2106.15649

Zobrazit plný text záznamu

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

Autor: Sri Karlapati, Jaime Lorenzo-Trueba, Alejandro Mottini, Thomas Drugman

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and evaluated

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7326f3cd106244f52a09c67c72ae71e6
http://arxiv.org/abs/2106.08873

Zobrazit plný text záznamu

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

Autor: Penny Karanasou, Thomas Drugman, Sri Karlapati, Zack Hodari, Ammar Abbas, Arnaud Joly, Alexis Moinet

Publikováno v: ICASSP

In this paper, we introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis with contextually appropriate prosody. In Stage I, we learn a prosodic distribution at the sentence level from mel-spectrograms a

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2ead08e99fa40df55a78514e77d8f5cc
https://doi.org/10.1109/icassp39728.2021.9413696

Zobrazit plný text záznamu

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Autor: Daniel Korzekwa, Bozena Kostek, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman

Publikováno v: ICASSP

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e37344559f3eae8aa9fdaffd3082b67b

Zobrazit plný text záznamu

Weakly-supervised word-level pronunciation error detection in non-native English speech

Autor: Jaime Lorenzo-Trueba, Thomas Drugman, Bozena Kostek, Shira Calamaro, Daniel Korzekwa

We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonet

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::05b113dec4fef8da1441fd5048af5a31

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání