A Better and Faster end-to-end Model for Streaming ASR

Autor:	Anmol Gulati, James Qin, Yonghui Wu, Yanzhang He, Yu Zhang, Tara N. Sainath, Trevor Strohman, Ruoming Pang, Arun Narayanan, Qiao Liang, Shuo-Yiin Chang, Chung-Cheng Chiu, Wei Han, Jiahui Yu, Bo Li
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Sound (cs.SD) Signal processing Computer science Word error rate Computer Science - Sound End-to-end principle Audio and Speech Processing (eess.AS) FOS: Electrical engineering electronic engineering information engineering Measurement uncertainty Beam search Latency (engineering) Encoder Algorithm Electrical Engineering and Systems Science - Audio and Speech Processing Degradation (telecommunications)
Zdroj:	ICASSP
Popis:	End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compared to a conventional ASR model. To address this issue, we look at encouraging the E2E model to emit words early, through an algorithm called FastEmit [3]. Naturally, improving on latency results in a quality degradation. To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR. Secondly, we also explore running a 2nd-pass beam search to improve quality. In order to ensure the 2nd-pass completes quickly, we explore non-causal Conformer layers that feed into the same 1st-pass RNN-T decoder, an algorithm called Cascaded Encoders [5]. Overall, we find that the Conformer RNN-T with Cascaded Encoders offers a better quality and latency tradeoff for streaming ASR. Accepted in ICASSP 2021
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::79f2c3f759db2f52ddb94da74741686d https://doi.org/10.1109/icassp39728.2021.9413899 Zobrazit plný text záznamu