Výsledky vyhledávání - "Gosztolya, Gábor"

Report

Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks

Autor: Tóth, László, Shandiz, Amin Honarmandi, Gosztolya, Gábor, Gábor, Csapó Tamás

Publikováno v: the Proceedings of Interspeech 2023

Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a

Externí odkaz: http://arxiv.org/abs/2305.19130

Zobrazit plný text záznamu

Report

Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging

Autor: Zainkó, Csaba, Tóth, László, Shandiz, Amin Honarmandi, Gosztolya, Gábor, Markó, Alexandra, Németh, Géza, Csapó, Tamás Gábor

For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacot

Externí odkaz: http://arxiv.org/abs/2107.12051

Zobrazit plný text záznamu

Report

Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input

Autor: Csapó, Tamás Gábor, Tóth, László, Gosztolya, Gábor, Markó, Alexandra

Articulatory information has been shown to be effective in improving the performance of HMM-based and DNN-based text-to-speech synthesis. Speech synthesis research focuses traditionally on text-to-speech conversion, when the input is text or an estim

Externí odkaz: http://arxiv.org/abs/2107.02003

Zobrazit plný text záznamu

Report

Neural Speaker Embeddings for Ultrasound-based Silent Speech Interfaces

Autor: Shandiz, Amin Honarmandi, Tóth, László, Gosztolya, Gábor, Markó, Alexandra, Csapó, Tamás Gábor

Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly s

Externí odkaz: http://arxiv.org/abs/2106.04552

Zobrazit plný text záznamu

Report

Improving Neural Silent Speech Interface Models by Adversarial Training

Autor: Shandiz, Amin Honarmandi, Tóth, László, Gosztolya, Gábor, Markó, Alexandra, Csapó, Tamás Gábor

Besides the well-known classification task, these days neural networks are frequently being applied to generate or transform data, such as images and audio signals. In such tasks, the conventional loss functions like the mean squared error (MSE) may

Externí odkaz: http://arxiv.org/abs/2104.11601

Zobrazit plný text záznamu

Report

Applying Speech Tempo-Derived Features, BoAW and Fisher Vectors to Detect Elderly Emotion and Speech in Surgical Masks

Autor: Gosztolya, Gábor, Tóth, László

The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) consists of three Sub-Challenges, where the tasks are to identify the level of arousal and valence of elderly speakers, determine whether the actual speaker wearing a surgical mas

Externí odkaz: http://arxiv.org/abs/2008.03183

Zobrazit plný text záznamu

Report

Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis

Autor: Csapó, Tamás Gábor, Zainkó, Csaba, Tóth, László, Gosztolya, Gábor, Markó, Alexandra

For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in

Externí odkaz: http://arxiv.org/abs/2008.03152

Zobrazit plný text záznamu

Report

Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder

Autor: Csapó, Tamás Gábor, Al-Radhi, Mohammed Salah, Németh, Géza, Gosztolya, Gábor, Grósz, Tamás, Tóth, László, Markó, Alexandra

Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text

Externí odkaz: http://arxiv.org/abs/1906.09885

Zobrazit plný text záznamu

Report

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces

Autor: Gosztolya, Gábor, Pintér, Ádám, Tóth, László, Grósz, Tamás, Markó, Alexandra, Csapó, Tamás Gábor

When using ultrasound video as input, Deep Neural Network-based Silent Speech Interfaces usually rely on the whole image to estimate the spectral parameters required for the speech synthesis step. Although this approach is quite straightforward, and

Externí odkaz: http://arxiv.org/abs/1904.05259

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání