Zobrazeno 1 - 10
of 111
pro vyhledávání: '"Gosztolya Gábor"'
Autor:
Vetráb Mercedes, Gosztolya Gábor
Publikováno v:
Acta Universitatis Sapientiae: Informatica, Vol 14, Iss 1, Pp 1-21 (2022)
The problem of varying length recordings is a well-known issue in paralinguistics. We investigated how to resolve this problem using the bag-of-audio-words feature extraction approach. The steps of this technique involve preprocessing, clustering, qu
Externí odkaz:
https://doaj.org/article/0cee22bacdfe4a2987f1530c183bb1ac
Publikováno v:
the Proceedings of Interspeech 2023
Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a
Externí odkaz:
http://arxiv.org/abs/2305.19130
Autor:
Zainkó, Csaba, Tóth, László, Shandiz, Amin Honarmandi, Gosztolya, Gábor, Markó, Alexandra, Németh, Géza, Csapó, Tamás Gábor
For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacot
Externí odkaz:
http://arxiv.org/abs/2107.12051
Articulatory information has been shown to be effective in improving the performance of HMM-based and DNN-based text-to-speech synthesis. Speech synthesis research focuses traditionally on text-to-speech conversion, when the input is text or an estim
Externí odkaz:
http://arxiv.org/abs/2107.02003
Autor:
Shandiz, Amin Honarmandi, Tóth, László, Gosztolya, Gábor, Markó, Alexandra, Csapó, Tamás Gábor
Articulatory-to-acoustic mapping seeks to reconstruct speech from a recording of the articulatory movements, for example, an ultrasound video. Just like speech signals, these recordings represent not only the linguistic content, but are also highly s
Externí odkaz:
http://arxiv.org/abs/2106.04552
Autor:
Shandiz, Amin Honarmandi, Tóth, László, Gosztolya, Gábor, Markó, Alexandra, Csapó, Tamás Gábor
Besides the well-known classification task, these days neural networks are frequently being applied to generate or transform data, such as images and audio signals. In such tasks, the conventional loss functions like the mean squared error (MSE) may
Externí odkaz:
http://arxiv.org/abs/2104.11601
Autor:
Gosztolya, Gábor, Tóth, László
The 2020 INTERSPEECH Computational Paralinguistics Challenge (ComParE) consists of three Sub-Challenges, where the tasks are to identify the level of arousal and valence of elderly speakers, determine whether the actual speaker wearing a surgical mas
Externí odkaz:
http://arxiv.org/abs/2008.03183
For articulatory-to-acoustic mapping using deep neural networks, typically spectral and excitation parameters of vocoders have been used as the training targets. However, vocoding often results in buzzy and muffled final speech quality. Therefore, in
Externí odkaz:
http://arxiv.org/abs/2008.03152
Autor:
Csapó, Tamás Gábor, Al-Radhi, Mohammed Salah, Németh, Géza, Gosztolya, Gábor, Grósz, Tamás, Tóth, László, Markó, Alexandra
Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text
Externí odkaz:
http://arxiv.org/abs/1906.09885
Autor:
Gosztolya, Gábor, Pintér, Ádám, Tóth, László, Grósz, Tamás, Markó, Alexandra, Csapó, Tamás Gábor
When using ultrasound video as input, Deep Neural Network-based Silent Speech Interfaces usually rely on the whole image to estimate the spectral parameters required for the speech synthesis step. Although this approach is quite straightforward, and
Externí odkaz:
http://arxiv.org/abs/1904.05259