Evaluation of Automatic Speech Recognition Approaches

Autor: Vasconcelos, Daniel J R, Silva, Ticiana Linhares Coelho da, Cruz, L��via Almada, Magalh��es, Regis Pires, Fernandes, Guilherme Sales, Sampaio, Matheus Xavier
Jazyk: portugalština
Rok vydání: 2022
Předmět:
DOI: 10.5281/zenodo.5930717
Popis: Automatic Speech Recognition (ASR) is an essential task for many applications like automatic caption generation for videos, voice search, voice commands for smart homes, and chatbots. Due to the increasing popularity of these applications and the advances in deep learning models for transcribing speech into text, this work aims to evaluate the performance of commercial solutions for ASR that use deep learning models, such as Facebook Wit.ai, Microsoft Azure Speech, Google Cloud Speech-to-Text, Wav2Vec and AWS Transcribe. We performed the experiments with two real and public datasets, the Mozilla Common Voice and the Voxforge. The results demonstrate that the evaluated solutions slightly differ. However, Facebook Wit.ai outperforms the other analyzed approaches for the quality metrics collected like WER, BLEU, and METEOR. We also experiment to fine-tune Jasper Neural Network for ASR with four datasets different with no intersection to the ones we collect the quality metrics. We study the performance of the Jasper model for the two public datasets, comparing its results with the other pre-trained models.
In this version we remove overlapping files that were knowingly used in training.
Databáze: OpenAIRE