Výsledky vyhledávání

Dual Learning for Large Vocabulary On-Device ASR

Autor: Cal Peyser, Ronny Huang, Tara Sainath, Rohit Prabhavalkar, Michael Picheny, Kyunghyun Cho

Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to trai

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bdc32e0f8d8479890ba4e62b1313c8de
http://arxiv.org/abs/2301.04327

Zobrazit plný text záznamu

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Autor: W. Ronny Huang, Cal Peyser, Tara Sainath, Ruoming Pang, Trevor D. Strohman, Shankar Kumar

Publikováno v: Interspeech 2022.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::126fee9566d926b8afcb2cafa04759a9
https://doi.org/10.21437/interspeech.2022-10820

Zobrazit plný text záznamu

Towards Disentangled Speech Representations

Autor: Cal Peyser, W. Ronny Huang, Andrew Rosenberg, Tara Sainath, Michael Picheny, Kyunghyun Cho

Publikováno v: Interspeech 2022.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::8ddf9006e58f138a4f985bdb900dd43e
https://doi.org/10.21437/interspeech.2022-30

Zobrazit plný text záznamu

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Autor: W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9c15fa03f1db0e0b6fc4f4d4951f31a

Zobrazit plný text záznamu

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

Autor: Shankar Kumar, Trevor Strohman, Tara N. Sainath, Cal Peyser, David Rybach, W. Ronny Huang

We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we ins

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d2954399b811bdd2ccf1818fa16057d9
http://arxiv.org/abs/2104.04552

Zobrazit plný text záznamu

Improving Proper Noun Recognition in End-to-End ASR By Customization of the MWER Loss Criterion

Autor: Cal Peyser, Tara N. Sainath, Golan Pundak

Publikováno v: ICASSP

Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b6514c22836446e9f5909bfba4ff3be6
http://arxiv.org/abs/2005.09756

Zobrazit plný text záznamu

A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency

Publikováno v: ICASSP

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::a5f0da7e8de58cbb33182c464d112a7f
https://doi.org/10.1109/icassp40776.2020.9054188

Zobrazit plný text záznamu

Improving Tail Performance of a Deliberation E2E ASR Model Using a Large Text Corpus

Autor: Tara N. Sainath, James Apfel, Ruoming Pang, Sepand Mavandadi, Shankar Kumar, Cal Peyser

Publikováno v: INTERSPEECH

End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems. While this simplifies the model architecture, it complicates the task of incorporating text-onl

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f589ab9631e0cf4b8199dd2f1fe57c35

Zobrazit plný text záznamu

Improving Performance of End-to-End ASR on Numeric Sequences

Autor: Zelin Wu, Hao Zhang, Cal Peyser, Tara N. Sainath

Publikováno v: INTERSPEECH

Recognizing written domain numeric utterances (e.g. I need $1.25.) can be challenging for ASR systems, particularly when numeric sequences are not seen during training. This out-of-vocabulary (OOV) issue is addressed in conventional ASR systems by tr

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ebdcf6773649699ce7d675e5190ba61b

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání