Výsledky vyhledávání - "Maniati, Georgia"

Report

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Autor: Kakoulidis, Panos, Ellinas, Nikolaos, Vamvoukakis, Georgios, Christidou, Myrsini, Vioni, Alexandra, Maniati, Georgia, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Tsiakoulis, Pirros, Chalamandaris, Aimilios

In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource pipeline that does not utilize any singing data end-to-end, since it

Externí odkaz: http://arxiv.org/abs/2402.01520

Zobrazit plný text záznamu

Report

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Autor: Nikitaras, Karolos, Klapsas, Konstantinos, Ellinas, Nikolaos, Maniati, Georgia, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

This paper proposes an Expressive Speech Synthesis model that utilizes token-level latent prosodic variables in order to capture and control utterance-level attributes, such as character acting voice and speaking style. Current works aim to explicitl

Externí odkaz: http://arxiv.org/abs/2211.00523

Zobrazit plný text záznamu

Report

Generating Multilingual Gender-Ambiguous Text-to-Speech Voices

Autor: Markopoulos, Konstantinos, Maniati, Georgia, Vamvoukakis, Georgios, Ellinas, Nikolaos, Vardaxoglou, Georgios, Kakoulidis, Panos, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros

The gender of any voice user interface is a key element of its perceived identity. Recently, there has been increasing interest in interfaces where the gender is ambiguous rather than clearly identifying as female or male. This work addresses the tas

Externí odkaz: http://arxiv.org/abs/2211.00375

Zobrazit plný text záznamu

Report

Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features

Autor: Vioni, Alexandra, Maniati, Georgia, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros

Current state-of-the-art methods for automatic synthetic speech evaluation are based on MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies on a pretrained sel

Externí odkaz: http://arxiv.org/abs/2211.00342

Zobrazit plný text záznamu

Report

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

Autor: Ellinas, Nikolaos, Vamvoukakis, Georgios, Markopoulos, Konstantinos, Maniati, Georgia, Kakoulidis, Panos, Sung, June Sig, Hwang, Inchul, Raptis, Spyros, Chalamandaris, Aimilios, Tsiakoulis, Pirros

This paper presents a method for end-to-end cross-lingual text-to-speech (TTS) which aims to preserve the target language's pronunciation regardless of the original speaker's language. The model used is based on a non-attentive Tacotron architecture,

Externí odkaz: http://arxiv.org/abs/2210.17264

Zobrazit plný text záznamu

Report

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis

Autor: Maniati, Georgia, Vioni, Alexandra, Ellinas, Nikolaos, Nikitaras, Karolos, Klapsas, Konstantinos, Sung, June Sig, Jho, Gunu, Chalamandaris, Aimilios, Tsiakoulis, Pirros

In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of mo

Externí odkaz: http://arxiv.org/abs/2204.03040

Zobrazit plný text záznamu

Report

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

Autor: Markopoulos, Konstantinos, Ellinas, Nikolaos, Vioni, Alexandra, Christidou, Myrsini, Kakoulidis, Panos, Vamvoukakis, Georgios, Maniati, Georgia, Sung, June Sig, Park, Hyoungmin, Tsiakoulis, Pirros, Chalamandaris, Aimilios

In this paper, a text-to-rapping/singing system is introduced, which can be adapted to any speaker's voice. It utilizes a Tacotron-based multispeaker acoustic model trained on read-only speech data and which provides prosody control at the phoneme le

Externí odkaz: http://arxiv.org/abs/2111.09146

Zobrazit plný text záznamu

Report

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

Autor: Maniati, Georgia, Ellinas, Nikolaos, Markopoulos, Konstantinos, Vamvoukakis, Georgios, Sung, June Sig, Park, Hyoungmin, Chalamandaris, Aimilios, Tsiakoulis, Pirros

The idea of using phonological features instead of phonemes as input to sequence-to-sequence TTS has been recently proposed for zero-shot multilingual speech synthesis. This approach is useful for code-switching, as it facilitates the seamless utteri

Externí odkaz: http://arxiv.org/abs/2111.09075

Zobrazit plný text záznamu

Report

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Autor: Ellinas, Nikolaos, Vamvoukakis, Georgios, Markopoulos, Konstantinos, Chalamandaris, Aimilios, Maniati, Georgia, Kakoulidis, Panos, Raptis, Spyros, Sung, June Sig, Park, Hyoungmin, Tsiakoulis, Pirros

This paper presents an end-to-end text-to-speech system with low latency on a CPU, suitable for real-time applications. The system is composed of an autoregressive attention-based sequence-to-sequence acoustic model and the LPCNet vocoder for wavefor

Externí odkaz: http://arxiv.org/abs/2111.09052

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání