Zobrazeno 1 - 9
of 9
pro vyhledávání: '"Cal Peyser"'
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to trai
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bdc32e0f8d8479890ba4e62b1313c8de
http://arxiv.org/abs/2301.04327
http://arxiv.org/abs/2301.04327
Publikováno v:
Interspeech 2022.
Publikováno v:
Interspeech 2022.
Autor:
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9c15fa03f1db0e0b6fc4f4d4951f31a
We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we ins
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d2954399b811bdd2ccf1818fa16057d9
http://arxiv.org/abs/2104.04552
http://arxiv.org/abs/2104.04552
Publikováno v:
ICASSP
Proper nouns present a challenge for end-to-end (E2E) automatic speech recognition (ASR) systems in that a particular name may appear only rarely during training, and may have a pronunciation similar to that of a more common word. Unlike conventional
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b6514c22836446e9f5909bfba4ff3be6
http://arxiv.org/abs/2005.09756
http://arxiv.org/abs/2005.09756
A Streaming On-Device End-To-End Model Surpassing Server-Side Conventional Model Quality and Latency
Autor:
Zhifeng Chen, Ian McGraw, David Garcia, Mirko Visontai, Yuan Shangguan, Bo Li, Yanzhang He, Qiao Liang, Antoine Bruguier, Tara N. Sainath, Yash Sheth, Yu Zhang, Golan Pundak, Chung-Cheng Chiu, Raziel Alvarez, Ke Hu, Cal Peyser, David Rybach, Alex Gruenstein, Yonghui Wu, Trevor Strohman, Ruoming Pang, Ding Zhao, Rohit Prabhavalkar, Arun Narayanan, Shuo-Yiin Chang, Wei Li, Anjuli Kannan
Publikováno v:
ICASSP
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking.
Publikováno v:
INTERSPEECH
End-to-end (E2E) automatic speech recognition (ASR) systems lack the distinct language model (LM) component that characterizes traditional speech systems. While this simplifies the model architecture, it complicates the task of incorporating text-onl
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::f589ab9631e0cf4b8199dd2f1fe57c35
Publikováno v:
INTERSPEECH
Recognizing written domain numeric utterances (e.g. I need $1.25.) can be challenging for ASR systems, particularly when numeric sequences are not seen during training. This out-of-vocabulary (OOV) issue is addressed in conventional ASR systems by tr
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ebdcf6773649699ce7d675e5190ba61b