Zobrazeno 1 - 10
of 153
pro vyhledávání: '"Bourlard, Herve"'
In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for
Externí odkaz:
http://arxiv.org/abs/2104.02558
In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI
Externí odkaz:
http://arxiv.org/abs/2012.14252
Autor:
Madikeri, Srikanth, Tong, Sibo, Zuluaga-Gomez, Juan, Vyas, Apoorv, Motlicek, Petr, Bourlard, Hervé
We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch
Externí odkaz:
http://arxiv.org/abs/2010.03466
This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bott
Externí odkaz:
http://arxiv.org/abs/1911.08332
State of the art solutions to query by example spoken term detection (QbE-STD) usually rely on bottleneck feature representation of the query and audio document to perform dynamic time warping (DTW) based template matching. Here, we present a study o
Externí odkaz:
http://arxiv.org/abs/1907.00443
Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual cont
Externí odkaz:
http://arxiv.org/abs/1711.10025
We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short tempo
Externí odkaz:
http://arxiv.org/abs/1709.01144
Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems corr
Externí odkaz:
http://arxiv.org/abs/1610.05688
We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse rep
Externí odkaz:
http://arxiv.org/abs/1601.05936
Publikováno v:
Speech Communication, Volume 84, November 2016, Pages 36-45
The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic
Externí odkaz:
http://arxiv.org/abs/1601.05647