Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Autor:	Yonghui Wu, Zhifeng Chen, Eugene Weinstein, Tara N. Sainath, Seungji Lee, Anjuli Kannan, Arindrima Datta, Ankur Bapna, Bhuvana Ramabhadran
Rok vydání:	2019
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Computer Science - Machine Learning Computer science Speech recognition First language Machine Learning (stat.ML) Computer Science - Sound Machine Learning (cs.LG) End-to-end principle Transcription (linguistics) Audio and Speech Processing (eess.AS) Statistics - Machine Learning FOS: Electrical engineering electronic engineering information engineering Utterance Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	INTERSPEECH
Popis:	Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by eliminating language-specific acoustic, pronunciation, and language models. This work presents an E2E multilingual system which is equipped to operate in low-latency interactive applications, as well as handle a key challenge of real world data: the imbalance in training data across languages. Using nine Indic languages, we compare a variety of techniques, and find that a combination of conditioning on a language vector and training language-specific adapter layers produces the best model. The resulting E2E multilingual model achieves a lower word error rate (WER) than both monolingual E2E models (eight of nine languages) and monolingual conventional systems (all nine languages). Accepted in Interspeech 2019
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6894d46cb3e24c527c9f7190e6ded635 https://doi.org/10.21437/interspeech.2019-2858 Zobrazit plný text záznamu