BCN2BRNO: ASR System Fusion for Albayzin 2020 Speech to Text Challenge

Autor:	David Bonet, Jordi Luque, Martin Kocour, Guillermo Cámbara, Mireia Farrús, Karel Veselý, Martin Karafiat, Jan Cernocký
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Fusion Computer Science - Computation and Language Audio and Speech Processing (eess.AS) Computer science Speech recognition FOS: Electrical engineering electronic engineering information engineering Computation and Language (cs.CL) Electrical Engineering and Systems Science - Audio and Speech Processing
Zdroj:	IberSPEECH 2021, Proceedings-ISCA 2021 IberSPEECH 2021 IberSPEECH
Popis:	This paper describes joint effort of BUT and Telef\'onica Research on development of Automatic Speech Recognition systems for Albayzin 2020 Challenge. We compare approaches based on either hybrid or end-to-end models. In hybrid modelling, we explore the impact of SpecAugment layer on performance. For end-to-end modelling, we used a convolutional neural network with gated linear units (GLUs). The performance of such model is also evaluated with an additional n-gram language model to improve word error rates. We further inspect source separation methods to extract speech from noisy environment (i.e. TV shows). More precisely, we assess the effect of using a neural-based music separator named Demucs. A fusion of our best systems achieved 23.33% WER in official Albayzin 2020 evaluations. Aside from techniques used in our final submitted systems, we also describe our efforts in retrieving high quality transcripts for training. Comment: fusion, end-to-end model, hybrid model, semisupervised, automatic speech recognition, convolutional neural network
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::cbc5ba549fb9429d0e12f6d1545366b7 http://arxiv.org/abs/2101.12729 Zobrazit plný text záznamu