Zobrazeno 1 - 10
of 195
pro vyhledávání: '"Eugene Weinstein"'
Publikováno v:
ICASSP
The RNN-Transducer (RNNT) outperforms classic Automatic Speech Recognition (ASR) systems when a large amount of supervised training data is available. For low-resource languages, the RNNT models overfit, and can not directly take advantage of additio
Autor:
Yonghui Wu, Zhifeng Chen, Eugene Weinstein, Tara N. Sainath, Seungji Lee, Anjuli Kannan, Arindrima Datta, Ankur Bapna, Bhuvana Ramabhadran
Publikováno v:
INTERSPEECH
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by elim
Publikováno v:
SLT
Multidialectal languages can pose challenges for acoustic modeling. Past research has shown that with a large training corpus but without explicit modeling of inter-dialect variability, training individual per-dialect models yields superior performan
Publikováno v:
ICASSP
This paper describes a series of experiments with neural networks containing long short-term memory (LSTM) [1] and feedforward sequential memory network (FSMN) [2]–[4] layers trained with the connectionist temporal classification (CTC) [5] criteria
Autor:
Ron Weiss, Eugene Weinstein, Kanishka Rao, Shubham Toshniwal, Bo Li, Pedro J. Moreno, Tara N. Sainath
Publikováno v:
ICASSP
Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well
Publikováno v:
ASRU
We explore the feasibility of training long short-term memory (LSTM) recurrent neural networks (RNNs) with syllables, rather than phonemes, as outputs. Syllables are a natural choice of linguistic unit for modeling the acoustics of languages such as
Autor:
Eugene Weinstein, Khe Chai Sim, Michiel Bacchiani, Patrick Nguyen, Kanishka Rao, Yonghui Wu, Zhifeng Chen, Bo Li, Tara N. Sainath
Publikováno v:
ICASSP
Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural netwo
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5aa978d451849f7c0cfca59120f40c1f
Publikováno v:
SLT
This paper describes a new technique to automatically obtain large high-quality training speech corpora for acoustic modeling. Traditional approaches select utterances based on confidence thresholds and other heuristics. We propose instead to use an
Publikováno v:
IEEE Transactions on Audio, Speech, and Language Processing. 18:197-207
We present an approach to music identification based on weighted finite-state transducers and Gaussian mixture models, inspired by techniques used in large-vocabulary speech recognition. Our modeling approach is based on learning a set of elementary
Publikováno v:
ICASSP
We present an analysis of music modeling and recognition techniques in the context of mobile music matching, substantially improving on the techniques presented in [1]. We accomplish this by adapting the features specifically to this task, and by int