Fast offline transformer-based end-to-end automatic speech recognition for real-world applications

Autor:	Yoo Rhee Oh, Kiyoung Park
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	connectionist temporal classification end-to-end speech recognition transformer Telecommunication TK5101-6720 Electronics TK7800-8360
Zdroj:	ETRI Journal, Vol 44, Iss 3, Pp 476-490 (2022)
Druh dokumentu:	article
ISSN:	1225-6463
DOI:	10.4218/etrij.2021-0106
Popis:	With the recent advances in technology, automatic speech recognition (ASR) has been widely used in real-world applications. The efficiency of converting large amounts of speech into text accurately with limited resources has become more vital than ever. In this study, we propose a method to rapidly recognize a large speech database via a transformer-based end-to-end model. Transformers have improved the state-of-the-art performance in many fields. However, they are not easy to use for long sequences. In this study, various techniques to accelerate the recognition of real-world speeches are proposed and tested, including decoding via multiple-utterance-batched beam search, detecting end of speech based on a connectionist temporal classification (CTC), restricting the CTC-prefix score, and splitting long speeches into short segments. Experiments are conducted with the Librispeech dataset and the real-world Korean ASR tasks to verify the proposed methods. From the experiments, the proposed system can convert 8 h of speeches spoken at real-world meetings into text in less than 3 min with a 10.73% character error rate, which is 27.1% relatively lower than that of conventional systems.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/8298a8a44f9d45ee954cb9e3ef9bef5b Zobrazit plný text záznamu Plný text View record in DOAJ