RNN-T For Latency Controlled ASR With Improved Beam Search

Autor:	Jain, Mahaveer, Schubert, Kjell, Mahadeokar, Jay, Yeh, Ching-Feng, Kalgaonkar, Kaustubh, Sriram, Anuroop, Fuegen, Christian, Seltzer, Michael L.
Rok vydání:	2019
Předmět:	Computer Science - Computation and Language Computer Science - Machine Learning Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization) into one single model. This greatly simplifies training and inference and hence makes RNN-T a desirable choice for ASR systems. In this work, we investigate use of RNN-T in applications that require a tune-able latency budget during inference time. We also improved the decoding speed of the originally proposed RNN-T beam search algorithm. We evaluated our proposed system on English videos ASR dataset and show that neural RNN-T models can achieve comparable WER and better computational efficiency compared to a well tuned hybrid ASR baseline.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/1911.01629 Zobrazit plný text záznamu View this record from Arxiv