Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition
Autor: | Min Ma, Jesse Emond, Fadi Biadsy, Bhuvana Ramabhadran, Andrew Rosenberg |
---|---|
Rok vydání: | 2019 |
Předmět: |
Voice search
Dictation Computer science Principle of maximum entropy Speech recognition Realization (linguistics) 02 engineering and technology language.human_language 030507 speech-language pathology & audiology 03 medical and health sciences ComputingMethodologies_PATTERNRECOGNITION Bengali 020204 information systems 0202 electrical engineering electronic engineering information engineering Transliteration language Language model 0305 other medical science |
Zdroj: | ICASSP |
DOI: | 10.1109/icassp.2019.8682824 |
Popis: | Code-switching occurs when the speaker alternates between two or more languages or dialects. It is a pervasive phenomenon in most Indic spoken languages. Code-switching poses a challenge in language modeling as it complicates the orthographic realization of text, and generally, there is a shortage of code-switched data. In this paper, we investigate data augmentation and adaptation strategies for language modeling. Using Bengali and English as an example, we study augmenting the code-switched transcripts with separate transliterated Bengali and English corpora. We present results on two speech recognition tasks, namely, voice search and dictation. We show improvements on both tasks with Maximum Entropy (MaxEnt) and Long Short-Term Memory (LSTM) language models (LMs). We also explore different adaptation strategies for MaxEnt LM and LSTM LM, demonstrating that the transliteration-based data-augmented LSTM LM matches the adapted MaxEnt LM which is trained on more Bengali-English data. |
Databáze: | OpenAIRE |
Externí odkaz: |