Zobrazeno 1 - 10
of 85
pro vyhledávání: '"Thomas Drugman"'
Autor:
Peter Makarov, Syed Ammar Abbas, Mateusz Łajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou
Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based F
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::76854a79fc5858f735a05d6824f9eda7
http://arxiv.org/abs/2206.14643
http://arxiv.org/abs/2206.14643
Autor:
Syed Ammar Abbas, Thomas Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman
Duration modelling has become an important research problem once more with the rise of non-attention neural text-to-speech systems. The current approaches largely fall back to relying on previous statistical parametric speech synthesis technology for
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::79880a0740607b602f22960ff55384bf
http://arxiv.org/abs/2206.14165
http://arxiv.org/abs/2206.14165
Autor:
Sri Karlapati, Penny Karanasou, Mateusz Łajszczak, Syed Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman
In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::df16965a817345bcda4ceb2494b51158
http://arxiv.org/abs/2206.13443
http://arxiv.org/abs/2206.13443
The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8532d016275ed97d9ea5a1955be3a96d
Publikováno v:
11th ISCA Speech Synthesis Workshop (SSW 11).
Emotional voice conversion models adapt the emotion in speech without changing the speaker identity or linguistic content. They are less data hungry than text-to-speech models and allow to generate large amounts of emotional data for downstream tasks
Autor:
Bajibabu Bollepalli, Arnaud Joly, Penny Karanasou, Simon Slangen, Thomas Drugman, Ammar Abbas, Peter Makarov, Alexis Moinet, Sri Karlapati
We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody. We present a generic multi-scale spectrogram prediction mechanism where the system first predicts coarser scale
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1b9ef65e9e72575811bb74c80c567052
http://arxiv.org/abs/2106.15649
http://arxiv.org/abs/2106.15649
Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker. While there is a rich literature on VC, most proposed methods are trained and evaluated
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7326f3cd106244f52a09c67c72ae71e6
http://arxiv.org/abs/2106.08873
http://arxiv.org/abs/2106.08873
Autor:
Penny Karanasou, Thomas Drugman, Sri Karlapati, Zack Hodari, Ammar Abbas, Arnaud Joly, Alexis Moinet
Publikováno v:
ICASSP
In this paper, we introduce Kathaka, a model trained with a novel two-stage training process for neural speech synthesis with contextually appropriate prosody. In Stage I, we learn a prosodic distribution at the sentence level from mel-spectrograms a
Autor:
Daniel Korzekwa, Bozena Kostek, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman
Publikováno v:
ICASSP
A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e37344559f3eae8aa9fdaffd3082b67b
We propose a weakly-supervised model for word-level mispronunciation detection in non-native (L2) English speech. To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words. The lack of phonet
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::05b113dec4fef8da1441fd5048af5a31