Výsledky vyhledávání - "Klapsas, Konstantinos"

Report

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Autor: Klapsas, Konstantinos, Nikitaras, Karolos, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

A large part of the expressive speech synthesis literature focuses on learning prosodic representations of the speech signal which are then modeled by a prior distribution during inference. In this paper, we compare different prior architectures at t

Externí odkaz: http://arxiv.org/abs/2211.01327

Zobrazit plný text záznamu

Report

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Autor: Nikitaras, Karolos, Klapsas, Konstantinos, Ellinas, Nikolaos, Maniati, Georgia, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

This paper proposes an Expressive Speech Synthesis model that utilizes token-level latent prosodic variables in order to capture and control utterance-level attributes, such as character acting voice and speaking style. Current works aim to explicitl

Externí odkaz: http://arxiv.org/abs/2211.00523

Zobrazit plný text záznamu

Report

Fine-grained Noise Control for Multispeaker Speech Synthesis

Autor: Nikitaras, Karolos, Vamvoukakis, Georgios, Ellinas, Nikolaos, Klapsas, Konstantinos, Markopoulos, Konstantinos, Raptis, Spyros, Sung, June Sig, Jho, Gunu, Chalamandaris, Aimilios, Tsiakoulis, Pirros

A text-to-speech (TTS) model typically factorizes speech attributes such as content, speaker and prosody into disentangled representations.Recent works aim to additionally model the acoustic conditions explicitly, in order to disentangle the primary

Externí odkaz: http://arxiv.org/abs/2204.05070

Zobrazit plný text záznamu

Report

Self-supervised learning for robust voice cloning

Autor: Klapsas, Konstantinos, Ellinas, Nikolaos, Nikitaras, Karolos, Vamvoukakis, Georgios, Kakoulidis, Panos, Markopoulos, Konstantinos, Raptis, Spyros, Sung, June Sig, Jho, Gunu, Chalamandaris, Aimilios, Tsiakoulis, Pirros

Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework

Externí odkaz: http://arxiv.org/abs/2204.03421

Zobrazit plný text záznamu

Report

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

Autor: Maniati, Georgia, Vioni, Alexandra, Ellinas, Nikolaos, Nikitaras, Karolos, Klapsas, Konstantinos, Sung, June Sig, Jho, Gunu, Chalamandaris, Aimilios, Tsiakoulis, Pirros

In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of mo

Externí odkaz: http://arxiv.org/abs/2204.03040

Zobrazit plný text záznamu

Report

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Autor: Klapsas, Konstantinos, Ellinas, Nikolaos, Sung, June Sig, Park, Hyoungmin, Raptis, Spyros

This paper presents an expressive speech synthesis architecture for modeling and controlling the speaking style at a word level. It attempts to learn word-level stylistic and prosodic representations of the speech data, with the aid of two encoders.

Externí odkaz: http://arxiv.org/abs/2111.10173

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání