Zobrazeno 1 - 10
of 49
pro vyhledávání: '"Zoltán Tüske"'
Publikováno v:
Interspeech 2021.
Autor:
Kailash Gopalakrishnan, Swagath Venkataramani, Wei Zhang, George Saon, Xiao Sun, Andrea Fasoli, Chia-Yu Chen, Xiaodong Cui, Mauricio J. Serrano, Zoltán Tüske, Naigang Wang, Brian Kingsbury
We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-H
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::692d7369d4cfd471cc31430dde91dc57
http://arxiv.org/abs/2108.12074
http://arxiv.org/abs/2108.12074
When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, w
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::43552ddc9234b56d1b9945330f610fea
http://arxiv.org/abs/2108.10803
http://arxiv.org/abs/2108.10803
Autor:
Zoltán Tüske, Sachindra Joshi, Samuel Thomas, Brian Kingsbury, Hong-Kwang J. Kuo, Jatin Ganhotra, George Saon
End-to-end spoken language understanding (SLU) systems that process human-human or human-computer interactions are often context independent and process each turn of a conversation independently. Spoken conversations on the other hand, are very much
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e3ea2a5d9d0e9befb147ae4ad8d44aa3
http://arxiv.org/abs/2108.08405
http://arxiv.org/abs/2108.08405
Publikováno v:
ICASSP
We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in lowering the word error rate on three different tasks (Switchboard 300 hours, conversational Spanish 780 hours and conversational Italian 900 hours). The techni
Autor:
Brian Kingsbury, Hong-Kwang J. Kuo, Zvi Kons, Ron Hoory, Gakuto Kurata, Samuel Thomas, Zoltán Tüske, George Saon
Publikováno v:
ICASSP
We present a comprehensive study on building and adapting RNN transducer (RNN-T) models for spoken language understanding(SLU). These end-to-end (E2E) models are constructed in three practical settings: a case where verbatim transcripts are available
In our previous work we demonstrated that a single headed attention encoder-decoder model is able to reach state-of-the-art results in conversational speech recognition. In this paper, we further improve the results for both Switchboard 300 and 2000.
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2889d24f0470a6c9702112d1d9e75261
http://arxiv.org/abs/2105.00982
http://arxiv.org/abs/2105.00982
Publikováno v:
ICASSP
Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further inv
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a0e2f52c5ef493d0a61989a55a633f22
http://arxiv.org/abs/2011.08238
http://arxiv.org/abs/2011.08238
Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard
Publikováno v:
INTERSPEECH
It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training. In this paper, we show that state
Autor:
Kartik Audhkhasi, Luis A. Lastras, Zoltán Tüske, Yinghui Huang, Zvi Kons, Samuel Thomas, Brian Kingsbury, Hong-Kwang J. Kuo, Gakuto Kurata, Ron Hoory
Publikováno v:
INTERSPEECH
An essential component of spoken language understanding (SLU) is slot filling: representing the meaning of a spoken utterance using semantic entity labels. In this paper, we develop end-to-end (E2E) spoken language understanding systems that directly