Zobrazeno 1 - 10
of 43
pro vyhledávání: '"Pirros Tsiakoulis"'
Autor:
Konstantinos Klapsas, Nikolaos Ellinas, Karolos Nikitaras, Georgios Vamvoukakis, Panagiotis Kakoulidis, Konstantinos Markopoulos, Spyros Raptis, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis
Publikováno v:
Interspeech 2022.
Autor:
Georgia Maniati, Alexandra Vioni, Nikolaos Ellinas, Karolos Nikitaras, Konstantinos Klapsas, June Sig Sung, Gunu Jho, Aimilios Chalamandaris, Pirros Tsiakoulis
In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of mo
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cc651faeaa04115356a3dc709682b63a
http://arxiv.org/abs/2204.03040
http://arxiv.org/abs/2204.03040
Autor:
Nikolaos Ellinas, Georgios Vamvoukakis, Tae-Hoon Kim, Aimilios Chalamandaris, Panos Kakoulidis, Hyoungmin Park, Myrsini Christidou, Alexandra Vioni, June Sig Sung, Pirros Tsiakoulis
Publikováno v:
ICASSP
This paper presents a method for controlling the prosody at the phoneme level in an autoregressive attention-based text-to-speech system. Instead of learning latent prosodic features with a variational framework as is commonly done, we directly extra
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::17c8aee70c6bacd61d322c38a73e80b8
http://arxiv.org/abs/2111.10177
http://arxiv.org/abs/2111.10177
Autor:
Aimilios Chalamandaris, Georgios Vamvoukakis, Nikolaos Ellinas, Konstantinos Markopoulos, Pirros Tsiakoulis, June Sig Sung, Hyoungmin Park, Georgia Maniati
The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless utteri
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1a24fc2c09faa4d72db9308dddd537a4
http://arxiv.org/abs/2111.09075
http://arxiv.org/abs/2111.09075
Autor:
Konstantinos Markopoulos, Myrsini Christidou, June Sig Sung, Aimilios Chalamandaris, Pirros Tsiakoulis, Georgios Vamvoukakis, Alexandra Vioni, Panos Kakoulidis, Hyoungmin Park, Nikolaos Ellinas, Georgia Maniati
In this paper, a text-to-rapping/singing system is introduced, which can be adapted to any speaker's voice. It utilizes a Tacotron-based multispeaker acoustic model trained on read-only speech data and which provides prosody control at the phoneme le
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::16fb08b35ac399ca9b1c19767b2451d6
http://arxiv.org/abs/2111.09146
http://arxiv.org/abs/2111.09146
Autor:
Pirros Tsiakoulis, Panos Kakoulidis, June Sig Sung, Konstantinos Markopoulos, Myrsini Christidou, Nikolaos Ellinas, Alexandra Vioni, Georgios Vamvoukakis, Aimilios Chalamandaris, Hyoungmin Park
Publikováno v:
Speech and Computer ISBN: 9783030878016
This paper presents a method for phoneme-level prosody control of F0 and duration on a multispeaker text-to-speech setup, which is based on prosodic clustering. An autoregressive attention-based model is used, incorporating multispeaker architecture
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::cdaf2cb1601bb31210d84862c313de8d
https://doi.org/10.1007/978-3-030-87802-3_11
https://doi.org/10.1007/978-3-030-87802-3_11
Autor:
June Sig Sung, Spyros Raptis, Pirros Tsiakoulis, Nikolaos Ellinas, Aimilios Chalamandaris, Hyoungmin Park, Georgia Maniati, Georgios Vamvoukakis, Panos Kakoulidis, Konstantinos Markopoulos
Publikováno v:
INTERSPEECH
This paper presents an end-to-end text-to-speech system with low latency on a CPU, suitable for real-time applications. The system is composed of an autoregressive attention-based sequence-to-sequence acoustic model and the LPCNet vocoder for wavefor
Publikováno v:
Speech Communication. 95:137-152
High quality expressive speech synthesis has been a long-standing goal towards natural human-computer interaction. Generating a talking head which is both realistic and expressive appears to be a considerable challenge, due to both the high complexit
Publikováno v:
Journal on Multimodal User Interfaces. 9:387-394
Emotion-aware computing presents one of the key challenges in contemporary natural human interaction research in which emotional speech is an essential modality in multimodal user interfaces. Speech modality relates mainly to speech emotion and affec
Publikováno v:
PCI
This work explores affective word ratings as an auxiliary target cost for unit-selection-based concatenative speech synthesis. The method does not require task-specific crafted corpora, nor does it rely on additional annotations, making it ideal for