Zobrazeno 1 - 10
of 114
pro vyhledávání: '"Csapó, Tamás"'
The aim of the study is to investigate the complex mechanisms of speech perception and ultimately decode the electrical changes in the brain accruing while listening to speech. We attempt to decode heard speech from intracranial electroencephalograph
Externí odkaz:
http://arxiv.org/abs/2402.16996
Publikováno v:
the Proceedings of Interspeech 2023
Thanks to the latest deep learning algorithms, silent speech interfaces (SSI) are now able to synthesize intelligible speech from articulatory movement data under certain conditions. However, the resulting models are rather speaker-specific, making a
Externí odkaz:
http://arxiv.org/abs/2305.19130
Publikováno v:
Proceedings of Interspeech 2023
Previous initial research has already been carried out to propose speech-based BCI using brain signals (e.g. non-invasive EEG and invasive sEEG / ECoG), but there is a lack of combined methods that investigate non-invasive brain, articulation, and sp
Externí odkaz:
http://arxiv.org/abs/2306.05374
Neural network-based Text-to-Speech has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron2, FastSpeech, FastPitch) usually generate Mel-spectrogram from text and then synthesize speech using vocoder (e.g., Wa
Externí odkaz:
http://arxiv.org/abs/2208.07122
Traditional vocoder-based statistical parametric speech synthesis can be advantageous in applications that require low computational complexity. Recent neural vocoders, which can produce high naturalness, still cannot fulfill the requirement of being
Externí odkaz:
http://arxiv.org/abs/2108.01154
Autor:
Zainkó, Csaba, Tóth, László, Shandiz, Amin Honarmandi, Gosztolya, Gábor, Markó, Alexandra, Németh, Géza, Csapó, Tamás Gábor
For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacot
Externí odkaz:
http://arxiv.org/abs/2107.12051
Autor:
Csapó, Tamás Gábor
In this paper, we present our first experiments in text-to-articulation prediction, using ultrasound tongue image targets. We extend a traditional (vocoder-based) DNN-TTS framework with predicting PCA-compressed ultrasound images, of which the contin
Externí odkaz:
http://arxiv.org/abs/2107.05550
Articulatory information has been shown to be effective in improving the performance of HMM-based and DNN-based text-to-speech synthesis. Speech synthesis research focuses traditionally on text-to-speech conversion, when the input is text or an estim
Externí odkaz:
http://arxiv.org/abs/2107.02003
Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high computati
Externí odkaz:
http://arxiv.org/abs/2106.10481
To date, various speech technology systems have adopted the vocoder approach, a method for synthesizing speech waveform that shows a major role in the performance of statistical parametric speech synthesis. WaveNet one of the best models that nearly
Externí odkaz:
http://arxiv.org/abs/2106.06863