Comparison of Data Augmentation and Adaptation Strategies for Code-switched Automatic Speech Recognition

Autor:	Min Ma, Jesse Emond, Fadi Biadsy, Bhuvana Ramabhadran, Andrew Rosenberg
Rok vydání:	2019
Předmět:	Voice search Dictation Computer science Principle of maximum entropy Speech recognition Realization (linguistics) 02 engineering and technology language.human_language 030507 speech-language pathology & audiology 03 medical and health sciences ComputingMethodologies_PATTERNRECOGNITION Bengali 020204 information systems 0202 electrical engineering electronic engineering information engineering Transliteration language Language model 0305 other medical science
Zdroj:	ICASSP
DOI:	10.1109/icassp.2019.8682824
Popis:	Code-switching occurs when the speaker alternates between two or more languages or dialects. It is a pervasive phenomenon in most Indic spoken languages. Code-switching poses a challenge in language modeling as it complicates the orthographic realization of text, and generally, there is a shortage of code-switched data. In this paper, we investigate data augmentation and adaptation strategies for language modeling. Using Bengali and English as an example, we study augmenting the code-switched transcripts with separate transliterated Bengali and English corpora. We present results on two speech recognition tasks, namely, voice search and dictation. We show improvements on both tasks with Maximum Entropy (MaxEnt) and Long Short-Term Memory (LSTM) language models (LMs). We also explore different adaptation strategies for MaxEnt LM and LSTM LM, demonstrating that the transliteration-based data-augmented LSTM LM matches the adapted MaxEnt LM which is trained on more Bengali-English data.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6975ffa69affa2c16a54150b09ac1b4c https://doi.org/10.1109/icassp.2019.8682824 Zobrazit plný text záznamu