Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Schubert, Kjell"'
Autor:
Le, Duc, Seide, Frank, Wang, Yuhao, Li, Yang, Schubert, Kjell, Kalinli, Ozlem, Seltzer, Michael L.
We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no loss in accuracy. With the rise in popularity of neural-transducer type models like the RNN-
Externí odkaz:
http://arxiv.org/abs/2211.00896
Autor:
Pandey, Laxmi, Paul, Debjyoti, Chitkara, Pooja, Pang, Yutong, Zhang, Xuedong, Schubert, Kjell, Chou, Mark, Liu, Shu, Saraf, Yatharth
Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling appro
Externí odkaz:
http://arxiv.org/abs/2207.09674
Autor:
Zhang, Xiaohui, Zhang, Frank, Liu, Chunxi, Schubert, Kjell, Chan, Julian, Prakash, Pradyot, Liu, Jun, Yeh, Ching-Feng, Peng, Fuchun, Saraf, Yatharth, Zweig, Geoffrey
In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing so
Externí odkaz:
http://arxiv.org/abs/2011.04785
Autor:
Jain, Mahaveer, Schubert, Kjell, Mahadeokar, Jay, Yeh, Ching-Feng, Kalgaonkar, Kaustubh, Sriram, Anuroop, Fuegen, Christian, Seltzer, Michael L.
Neural transducer-based systems such as RNN Transducers (RNN-T) for automatic speech recognition (ASR) blend the individual components of a traditional hybrid ASR systems (acoustic model, language model, punctuation model, inverse text normalization)
Externí odkaz:
http://arxiv.org/abs/1911.01629
Autor:
Yeh, Ching-Feng, Mahadeokar, Jay, Kalgaonkar, Kaustubh, Wang, Yongqiang, Le, Duc, Jain, Mahaveer, Schubert, Kjell, Fuegen, Christian, Seltzer, Michael L.
We explore options to use Transformer networks in neural transducer for end-to-end speech recognition. Transformer networks use self-attention for sequence modeling and comes with advantages in parallel computation and capturing contexts. We propose
Externí odkaz:
http://arxiv.org/abs/1910.12977