Výsledky vyhledávání - "Bourlard, Herve"

Report

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

Autor: Vyas, Apoorv, Madikeri, Srikanth, Bourlard, Hervé

In this work, we investigate if the wav2vec 2.0 self-supervised pretraining helps mitigate the overfitting issues with connectionist temporal classification (CTC) training to reduce its performance gap with flat-start lattice-free MMI (E2E-LFMMI) for

Externí odkaz: http://arxiv.org/abs/2104.02558

Zobrazit plný text záznamu

Report

Lattice-Free MMI Adaptation Of Self-Supervised Pretrained Acoustic Models

Autor: Vyas, Apoorv, Madikeri, Srikanth, Bourlard, Hervé

In this work, we propose lattice-free MMI (LFMMI) for supervised adaptation of self-supervised pretrained acoustic model. We pretrain a Transformer model on thousand hours of untranscribed Librispeech data followed by supervised adaptation with LFMMI

Externí odkaz: http://arxiv.org/abs/2012.14252

Zobrazit plný text záznamu

Report

Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

Autor: Madikeri, Srikanth, Tong, Sibo, Zuluaga-Gomez, Juan, Vyas, Apoorv, Motlicek, Petr, Bourlard, Hervé

We present a simple wrapper that is useful to train acoustic models in PyTorch using Kaldi's LF-MMI training framework. The wrapper, called pkwrap (short form of PyTorch kaldi wrapper), enables the user to utilize the flexibility provided by PyTorch

Externí odkaz: http://arxiv.org/abs/2010.03466

Zobrazit plný text záznamu

Report

Neural Network based End-to-End Query by Example Spoken Term Detection

Autor: Ram, Dhananjay, Miculicich, Lesly, Bourlard, Hervé

This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bott

Externí odkaz: http://arxiv.org/abs/1911.08332

Zobrazit plný text záznamu

Report

Multilingual Bottleneck Features for Query by Example Spoken Term Detection

Autor: Ram, Dhananjay, Miculicich, Lesly, Bourlard, Hervé

State of the art solutions to query by example spoken term detection (QbE-STD) usually rely on bottleneck feature representation of the query and audio document to perform dynamic time warping (DTW) based template matching. Here, we present a study o

Externí odkaz: http://arxiv.org/abs/1907.00443

Zobrazit plný text záznamu

Report

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

Autor: Tong, Sibo, Garner, Philip N., Bourlard, Hervé

Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual cont

Externí odkaz: http://arxiv.org/abs/1711.10025

Zobrazit plný text záznamu

Report

Information Theoretic Analysis of DNN-HMM Acoustic Modeling

Autor: Dighe, Pranay, Asaei, Afsaneh, Bourlard, Hervé

We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR). Acoustic modeling yields the probabilities of HMM sub-word states for a short tempo

Externí odkaz: http://arxiv.org/abs/1709.01144

Zobrazit plný text záznamu

Report

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

Autor: Dighe, Pranay, Asaei, Afsaneh, Bourlard, Herve

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems corr

Externí odkaz: http://arxiv.org/abs/1610.05688

Zobrazit plný text záznamu

Report

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

Autor: Dighe, Pranay, Luyet, Gil, Asaei, Afsaneh, Bourlard, Herve

We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces. To that end, the training posteriors are used for dictionary learning and sparse coding. Sparse rep

Externí odkaz: http://arxiv.org/abs/1601.05936

Zobrazit plný text záznamu

Report

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

Autor: Cernak, Milos, Asaei, Afsaneh, Bourlard, Hervé

Publikováno v: Speech Communication, Volume 84, November 2016, Pages 36-45

The speech signal conveys information on different time scales from short time scale or segmental, associated to phonological and phonetic information to long time scale or supra segmental, associated to syllabic and prosodic information. Linguistic

Externí odkaz: http://arxiv.org/abs/1601.05647

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání