Zobrazeno 1 - 10
of 207
pro vyhledávání: '"Bourlard, P"'
In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for
Externí odkaz:
http://arxiv.org/abs/2104.02558
In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI
Externí odkaz:
http://arxiv.org/abs/2012.14252
Automatic dysarthric speech detection can provide reliable and cost-effective computer-aided tools to assist the clinical diagnosis and management of dysarthria. In this paper we propose a novel automatic dysarthric speech detection approach based on
Externí odkaz:
http://arxiv.org/abs/2011.07545
Automatic techniques in the context of motor speech disorders (MSDs) are typically two-class techniques aiming to discriminate between dysarthria and neurotypical speech or between dysarthria and apraxia of speech (AoS). Further, although such techni
Externí odkaz:
http://arxiv.org/abs/2011.07542
Autor:
Madikeri, Srikanth, Tong, Sibo, Zuluaga-Gomez, Juan, Vyas, Apoorv, Motlicek, Petr, Bourlard, Hervé
We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch
Externí odkaz:
http://arxiv.org/abs/2010.03466
This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bott
Externí odkaz:
http://arxiv.org/abs/1911.08332
State of the art solutions to query by example spoken term detection (QbE-STD) usually rely on bottleneck feature representation of the query and audio document to perform dynamic time warping (DTW) based template matching. Here, we present a study o
Externí odkaz:
http://arxiv.org/abs/1907.00443
Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual cont
Externí odkaz:
http://arxiv.org/abs/1711.10025
We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short tempo
Externí odkaz:
http://arxiv.org/abs/1709.01144
Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems corr
Externí odkaz:
http://arxiv.org/abs/1610.05688