Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Sadhu, Samik"'
Autor:
Sadhu, Samik, Hermansky, Hynek
We show that training a multi-headed self-attention-based deep network to predict deleted, information-dense 2-8 Hz speech modulations over a 1.5-second section of a speech utterance is an effective way to make machines learn to extract speech modula
Externí odkaz:
http://arxiv.org/abs/2303.12908
Autor:
Sustek, Martin, Sadhu, Samik, Burget, Lukas, Hermansky, Hynek, Villalba, Jesus, Moro-Velazquez, Laureano, Dehak, Najim
The recently proposed Joint Energy-based Model (JEM) interprets discriminatively trained classifier $p(y|x)$ as an energy model, which is also trained as a generative model describing the distribution of the input observations $p(x)$. The JEM trainin
Externí odkaz:
http://arxiv.org/abs/2303.04187
Autor:
Sadhu, Samik, Hermansky, Hynek
We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data. U
Externí odkaz:
http://arxiv.org/abs/2210.00117
Autor:
Sadhu, Samik, Hermansky, Hynek
How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. Firstly, we quantify the amount of phonetic \textit{information} in the modulation spectrum of speech by comp
Externí odkaz:
http://arxiv.org/abs/2204.00065
Autor:
Sadhu, Samik, Hermansky, Hynek
Conventional Frequency Domain Linear Prediction (FDLP) technique models the squared Hilbert envelope of speech with varied degrees of approximation which can be sampled at the required frame rate and used as features for Automatic Speech Recognition
Externí odkaz:
http://arxiv.org/abs/2203.13216
Autor:
Sadhu, Samik, Hermansky, Hynek
We propose a technique to compute spectrograms using Frequency Domain Linear Prediction (FDLP) that uses all-pole models to fit the squared Hilbert envelope of speech in different frequency sub-bands. The spectrogram of a complete speech utterance is
Externí odkaz:
http://arxiv.org/abs/2103.14129
Autor:
Sadhu, Samik, He, Di, Huang, Che-Wei, Mallidi, Sri Harish, Wu, Minhua, Rastrow, Ariya, Stolcke, Andreas, Droppo, Jasha, Maas, Roland
Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to
Externí odkaz:
http://arxiv.org/abs/2103.08393
Quality of data plays an important role in most deep learning tasks. In the speech community, transcription of speech recording is indispensable. Since the transcription is usually generated artificially, automatically finding errors in manual transc
Externí odkaz:
http://arxiv.org/abs/1904.04294