Zobrazeno 1 - 10
of 17
pro vyhledávání: '"Tuske, Zoltan"'
In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training. Towards this end, triphone and BPE alignments are extracted using a pre-existing hybrid ASR system. Then, regularization effect is ob
Externí odkaz:
http://arxiv.org/abs/2402.15594
In this paper, we aim to create weak alignment supervision from an existing hybrid system to aid the end-to-end modeling of automatic speech recognition. Towards this end, we use the existing hybrid ASR system to produce triphone alignments of the tr
Externí odkaz:
http://arxiv.org/abs/2311.14835
The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible t
Externí odkaz:
http://arxiv.org/abs/2201.12105
When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, w
Externí odkaz:
http://arxiv.org/abs/2108.10803
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further inv
Externí odkaz:
http://arxiv.org/abs/2011.08238
Autor:
Thomas, Samuel, Suzuki, Masayuki, Huang, Yinghui, Kurata, Gakuto, Tuske, Zoltan, Saon, George, Kingsbury, Brian, Picheny, Michael, Dibert, Tom, Kaiser-Schatzlein, Alice, Samko, Bern
With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate t
Externí odkaz:
http://arxiv.org/abs/1904.13258
Publikováno v:
2016 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP); 2016, p6005-6009, 5p
Publikováno v:
2015 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU); 1/1/2015, p596-603, 8p
Autor:
Cui, Jia, Kingsbury, Brian, Ramabhadran, Bhuvana, Sethy, Abhinav, Audhkhasi, Kartik, Cui, Xiaodong, Kislal, Ellen, Mangu, Lidia, Nussbaum-Thom, Markus, Picheny, Michael, Tuske, Zoltan, Golik, Pavel, Schluter, Ralf, Ney, Hermann, Gales, Mark J. F., Knill, Kate M., Ragni, Anton, Wang, Haipeng, Woodland, Phil
Publikováno v:
2015 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU); 1/1/2015, p259-266, 8p
Publikováno v:
2015 IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP); 2015, p4285-4289, 5p