Zobrazeno 1 - 4
of 4
pro vyhledávání: '"Korostik, Roman"'
This paper proposes a generative pretraining foundation model for high-quality speech restoration tasks. By directly operating on complex-valued short-time Fourier transform coefficients, our model does not rely on any vocoders for time-domain signal
Externí odkaz:
http://arxiv.org/abs/2409.16117
This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distr
Externí odkaz:
http://arxiv.org/abs/2407.16074
We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained on transcribed speech data, text-only data, or a mixture of both. The proposed model uses an integrated auxiliary block for text-based training. This block combine
Externí odkaz:
http://arxiv.org/abs/2302.14036
Autor:
Laptev, Aleksandr, Korostik, Roman, Svischev, Aleksey, Andrusenko, Andrei, Medennikov, Ivan, Rybin, Sergey
Data augmentation is one of the most effective ways to make end-to-end automatic speech recognition (ASR) perform close to the conventional hybrid approach, especially when dealing with low-resource tasks. Using recent advances in speech synthesis (t
Externí odkaz:
http://arxiv.org/abs/2005.07157