Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Peyser, Cal"'
Autor:
Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana
Publikováno v:
Interspeech 2024, Kos Island, Greece
Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan
Externí odkaz:
http://arxiv.org/abs/2406.02921
Autor:
Peyser, Cal, Meng, Zhong, Hu, Ke, Prabhavalkar, Rohit, Rosenberg, Andrew, Sainath, Tara N., Picheny, Michael, Cho, Kyunghyun
Publikováno v:
INTERSPEECH 2023
The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint spe
Externí odkaz:
http://arxiv.org/abs/2308.06125
Autor:
Peyser, Cal, Picheny, Michael, Cho, Kyunghyun, Prabhavalkar, Rohit, Huang, Ronny, Sainath, Tara
Publikováno v:
2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus. However, little guidance exists on deploying these methods to improve production ASR systems that are trained o
Externí odkaz:
http://arxiv.org/abs/2304.11053
Autor:
Peyser, Cal, Huang, Ronny, Sainath, Tara, Prabhavalkar, Rohit, Picheny, Michael, Cho, Kyunghyun
Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to trai
Externí odkaz:
http://arxiv.org/abs/2301.04327
Autor:
Huang, W. Ronny, Chang, Shuo-Yiin, Sainath, Tara N., He, Yanzhang, Rybach, David, David, Robert, Prabhavalkar, Rohit, Allauzen, Cyril, Peyser, Cal, Strohman, Trevor D.
We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real
Externí odkaz:
http://arxiv.org/abs/2211.15432
The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speec
Externí odkaz:
http://arxiv.org/abs/2208.13191
Autor:
Huang, W. Ronny, Chang, Shuo-yiin, Rybach, David, Prabhavalkar, Rohit, Sainath, Tara N., Allauzen, Cyril, Peyser, Cal, Lu, Zhiyun
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector
Externí odkaz:
http://arxiv.org/abs/2204.10749
Autor:
Wang, Weiran, Chen, Tongzhou, Sainath, Tara N., Variani, Ehsan, Prabhavalkar, Rohit, Huang, Ronny, Ramabhadran, Bhuvana, Gaur, Neeraj, Mavandadi, Sepand, Peyser, Cal, Strohman, Trevor, He, Yanzhang, Rybach, David
Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups. In this work, we introduce LMs in the learning of hy
Externí odkaz:
http://arxiv.org/abs/2204.07553
Autor:
Huang, W. Ronny, Peyser, Cal, Sainath, Tara N., Pang, Ruoming, Strohman, Trevor, Kumar, Shankar
Language model fusion helps smart assistants recognize words which are rare in acoustic data but abundant in text-only corpora (typed search logs). However, such corpora have properties that hinder downstream performance, including being (1) too larg
Externí odkaz:
http://arxiv.org/abs/2203.05008
Autor:
Huang, W. Ronny, Sainath, Tara N., Peyser, Cal, Kumar, Shankar, Rybach, David, Strohman, Trevor
We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we ins
Externí odkaz:
http://arxiv.org/abs/2104.04552