End-to-End Speech Recognition Model Based on Deep Learning for Albanian

Autor: Arbana Kadriu, Amarildo Rista
Rok vydání: 2021
Předmět:
Zdroj: MIPRO
DOI: 10.23919/mipro52101.2021.9596713
Popis: Deep learning technology nowadays is considered one of the most advances in machine learning that have led to significant and widespread improvements in how people interact with the world. Deep Learning is a technique that constructs artificial neural networks to mimic the structure and function of the human brain. Recently Deep Learning is well known for its applicability in speech recognition. This stems mainly by the flexibility and predicting power of deep neural networks. This research paper introduces an end-to-end speech recognition model applicable for Albanian language. The model is based on Recurrent Neural Network (RNN) architecture, which will be created and implemented in Pytorch tool. It is composed by two main neural network modules - N layers of Residual Convolutional Neural Networks (ResCNN) to learn the relevant audio features, and a set of Bidirectional Recurrent Neural Networks (BiRNN) to leverage the learned ResCNN audio features. The model will be trained and evaluated using a corpus in Albanian, created for this purpose and a corpus in English derived by LibriSpeech. The experimental results show a very satisfactory WER, CER and Loss.
Databáze: OpenAIRE