A Model for Albanian Speech Recognition Using End-to-End Deep Learning Techniques
Autor: | Amarildo Rista, Arbana Kadriu |
---|---|
Rok vydání: | 2022 |
Zdroj: | Interdisciplinary Journal of Research and Development. 9:1 |
ISSN: | 2313-058X 2410-3411 |
DOI: | 10.56345/ijrdv9n301 |
Popis: | End-to-end Automatic Speech Recognition (ASR) system folds the acoustic model (AM), language model (LM), and pronunciation model (PM) into a single neural network. The joint optimization of all these components optimizes performance of the model. In this paper, we introduce a model for Albanian speech recognition (SR) using end-to-end deep learning techniques. The two main modules that build this model are: Residual Convolutional Neural Networks (ResCNN), which aims to learn the relevant features and Bidirectional Recurrent Neural Networks (BiRNN) aiming to leverage the learned ResCNN audio features. To train and evaluate the model, we have built a corpus for Albanian Speech Recognition (CASR), which contains 100 hours of audio data along with their transcripts. During the design of the corpus we took into account the attributes of the speaker such as: age, gender, and accent, speed of utterance and dialect, so that it is as heterogeneous as possible. The evaluation of the model is done through word error rate (WER) and character error rate (CER) metrics. It achieves 5% WER and 1% CER. |
Databáze: | OpenAIRE |
Externí odkaz: |