Thai Wav2Vec2.0 with CommonVoice V8

Autor:	Phatthiyaphaibun, Wannaphong, Chaksangchaichot, Chompakorn, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana
Rok vydání:	2022
Předmět:	Computer Science - Computation and Language Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2208.04799 Zobrazit plný text záznamu View this record from Arxiv