Výsledky vyhledávání - "Tuske, Zoltan"

Report

Alternating Weak Triphone/BPE Alignment Supervision from Hybrid Model Improves End-to-End ASR

Autor: Jiang, Jintao, Gao, Yingbo, Zeineldeen, Mohammad, Tuske, Zoltan

In this paper, alternating weak triphone/BPE alignment supervision is proposed to improve end-to-end model training. Towards this end, triphone and BPE alignments are extracted using a pre-existing hybrid ASR system. Then, regularization effect is ob

Externí odkaz: http://arxiv.org/abs/2402.15594

Zobrazit plný text záznamu

Report

Weak Alignment Supervision from Hybrid Model Improves End-to-end ASR

Autor: Jiang, Jintao, Gao, Yingbo, Tuske, Zoltan

In this paper, we aim to create weak alignment supervision from an existing hybrid system to aid the end-to-end modeling of automatic speech recognition. Towards this end, we use the existing hybrid ASR system to produce triphone alignments of the tr

Externí odkaz: http://arxiv.org/abs/2311.14835

Zobrazit plný text záznamu

Report

Improving End-to-End Models for Set Prediction in Spoken Language Understanding

Autor: Kuo, Hong-Kwang J., Tuske, Zoltan, Thomas, Samuel, Kingsbury, Brian, Saon, George

The goal of spoken language understanding (SLU) systems is to determine the meaning of the input speech signal, unlike speech recognition which aims to produce verbatim transcripts. Advances in end-to-end (E2E) speech modeling have made it possible t

Externí odkaz: http://arxiv.org/abs/2201.12105

Zobrazit plný text záznamu

Report

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

Autor: Cui, Xiaodong, Kingsbury, Brian, Saon, George, Haws, David, Tuske, Zoltan

When recurrent neural network transducers (RNNTs) are trained using the typical maximum likelihood criterion, the prediction network is trained only on ground truth label sequences. This leads to a mismatch during inference, known as exposure bias, w

Externí odkaz: http://arxiv.org/abs/2108.10803

Zobrazit plný text záznamu

Report

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Autor: Morais, Edmilson, Kuo, Hong-Kwang J., Thomas, Samuel, Tuske, Zoltan, Kingsbury, Brian

Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of spoken language understanding (SLU) still need further inv

Externí odkaz: http://arxiv.org/abs/2011.08238

Zobrazit plný text záznamu

Report

English Broadcast News Speech Recognition by Humans and Machines

Autor: Thomas, Samuel, Suzuki, Masayuki, Huang, Yinghui, Kurata, Gakuto, Tuske, Zoltan, Saon, George, Kingsbury, Brian, Picheny, Michael, Dibert, Tom, Kaiser-Schatzlein, Alice, Samko, Bern

With recent advances in deep learning, considerable attention has been given to achieving automatic speech recognition performance close to human performance on tasks like conversational telephone speech (CTS) recognition. In this paper we evaluate t

Externí odkaz: http://arxiv.org/abs/1904.13258

Zobrazit plný text záznamu