Zobrazeno 1 - 10
of 86
pro vyhledávání: '"Németh, Géza"'
Neural network-based Text-to-Speech has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron2, FastSpeech, FastPitch) usually generate Mel-spectrogram from text and then synthesize speech using vocoder (e.g., Wa
Externí odkaz:
http://arxiv.org/abs/2208.07122
Autor:
Zainkó, Csaba, Tóth, László, Shandiz, Amin Honarmandi, Gosztolya, Gábor, Markó, Alexandra, Németh, Géza, Csapó, Tamás Gábor
For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacot
Externí odkaz:
http://arxiv.org/abs/2107.12051
Vocoders received renewed attention as main components in statistical parametric text-to-speech (TTS) synthesis and speech transformation systems. Even though there are vocoding techniques give almost accepted synthesized speech, their high computati
Externí odkaz:
http://arxiv.org/abs/2106.10481
To date, various speech technology systems have adopted the vocoder approach, a method for synthesizing speech waveform that shows a major role in the performance of statistical parametric speech synthesis. WaveNet one of the best models that nearly
Externí odkaz:
http://arxiv.org/abs/2106.06863
Autor:
Alwaisi, Shaimaa1 shaima.alwaisi@edu.bme.hu, Németh, Géza1 nemeth@tmit.bme.hu
Publikováno v:
Infocommunications Journal. Mar2024, Vol. 16 Issue 1, p35-46. 12p.
Self-attention networks (SAN) have shown promising performance in various Natural Language Processing (NLP) scenarios, especially in machine translation. One of the main points of SANs is the strength of capturing long-range and multi-scale dependenc
Externí odkaz:
http://arxiv.org/abs/2006.15585
Attention mechanism is one of the most successful techniques in deep learning based Natural Language Processing (NLP). The transformer network architecture is completely based on attention mechanisms, and it outperforms sequence-to-sequence models in
Externí odkaz:
http://arxiv.org/abs/2004.06338
Autor:
Csapó, Tamás Gábor, Al-Radhi, Mohammed Salah, Németh, Géza, Gosztolya, Gábor, Grósz, Tamás, Tóth, László, Markó, Alexandra
Recently it was shown that within the Silent Speech Interface (SSI) field, the prediction of F0 is possible from Ultrasound Tongue Images (UTI) as the articulatory input, using Deep Neural Networks for articulatory-to-acoustic mapping. Moreover, text
Externí odkaz:
http://arxiv.org/abs/1906.09885
Recently in statistical parametric speech synthesis, we proposed a continuous sinusoidal model (CSM) using continuous F0 (contF0) in combination with Maximum Voiced Frequency (MVF), which was successfully giving state-of-the-art vocoders performance
Externí odkaz:
http://arxiv.org/abs/1904.06075
Autor:
Al-Radhi, Mohammed Salah, Abdo, Omnia, Csapó, Tamás Gábor, Abdou, Sherif, Németh, Géza, Fashal, Mervat
Publikováno v:
In Computer Speech & Language March 2020 60