Multilingual Speech Recognition with a Single End-to-End Model
Autor: | Ron Weiss, Eugene Weinstein, Kanishka Rao, Shubham Toshniwal, Bo Li, Pedro J. Moreno, Tara N. Sainath |
---|---|
Rok vydání: | 2018 |
Předmět: |
FOS: Computer and information sciences
Computer Science - Computation and Language Computer Science - Artificial Intelligence Computer science Speech recognition Grapheme 020206 networking & telecommunications 02 engineering and technology Pronunciation computer.software_genre Lexicon Data modeling Artificial Intelligence (cs.AI) Audio and Speech Processing (eess.AS) Scripting language FOS: Electrical engineering electronic engineering information engineering 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing Language model Computation and Language (cs.CL) computer Word (computer architecture) Electrical Engineering and Systems Science - Audio and Speech Processing |
Zdroj: | ICASSP |
Popis: | Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages. Accepted in ICASSP 2018 |
Databáze: | OpenAIRE |
Externí odkaz: |