Zobrazeno 1 - 10
of 128
pro vyhledávání: '"P, Prabhavalkar"'
Autor:
Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana
Publikováno v:
Interspeech 2024, Kos Island, Greece
Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan
Externí odkaz:
http://arxiv.org/abs/2406.02921
Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memori
Externí odkaz:
http://arxiv.org/abs/2406.02004
Autor:
Wu, Zelin, Song, Gan, Li, Christopher, Rondon, Pat, Meng, Zhong, Velez, Xavier, Wang, Weiran, Caseiro, Diamantino, Pundak, Golan, Munkhdalai, Tsendsuren, Chandorkar, Angad, Prabhavalkar, Rohit
Publikováno v:
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track
Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for
Externí odkaz:
http://arxiv.org/abs/2404.10180
Autor:
Prabhavalkar, Rohit, Meng, Zhong, Wang, Weiran, Stooke, Adam, Cai, Xingyu, He, Yanzhang, Narayanan, Arun, Hwang, Dongseong, Sainath, Tara N., Moreno, Pedro J.
The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires c
Externí odkaz:
http://arxiv.org/abs/2402.17184
Autor:
Arumugam, Guru Prakash, Chang, Shuo-yiin, Sainath, Tara N., Prabhavalkar, Rohit, Wang, Quan, Bijwadia, Shaan
ASR models often suffer from a long-form deletion problem where the model predicts sequential blanks instead of words when transcribing a lengthy audio (in the order of minutes or hours). From the perspective of a user or downstream system consuming
Externí odkaz:
http://arxiv.org/abs/2312.11123
Autor:
Ding, Shaojin, Qiu, David, Rim, David, He, Yanzhang, Rybakov, Oleg, Li, Bo, Prabhavalkar, Rohit, Wang, Weiran, Sainath, Tara N., Han, Zhonglin, Li, Jian, Yazdanbakhsh, Amir, Agrawal, Shivani
End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memo
Externí odkaz:
http://arxiv.org/abs/2312.08553
Autor:
Wang, Weiran, Wu, Zelin, Caseiro, Diamantino, Munkhdalai, Tsendsuren, Sim, Khe Chai, Rondon, Pat, Pundak, Golan, Song, Gan, Prabhavalkar, Rohit, Meng, Zhong, Zhao, Ding, Sainath, Tara, Mengibar, Pedro Moreno
Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-
Externí odkaz:
http://arxiv.org/abs/2310.00178
Autor:
Zhou, Lillian, Ding, Yuxin, Chen, Mingqing, Zhang, Harry, Prabhavalkar, Rohit, Guliani, Dhruv, Motta, Giovanni, Mathews, Rajiv
Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but d
Externí odkaz:
http://arxiv.org/abs/2310.00141
Autor:
Wang, Weiran, Prabhavalkar, Rohit, Hwang, Dongseong, Li, Qiujia, Sim, Khe Chai, Li, Bo, Qin, James, Cai, Xingyu, Stooke, Adam, Meng, Zhong, Zheng, CJ, He, Yanzhang, Sainath, Tara, Mengibar, Pedro Moreno
In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model paramet
Externí odkaz:
http://arxiv.org/abs/2309.12963
Autor:
Peyser, Cal, Meng, Zhong, Hu, Ke, Prabhavalkar, Rohit, Rosenberg, Andrew, Sainath, Tara N., Picheny, Michael, Cho, Kyunghyun
Publikováno v:
INTERSPEECH 2023
The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint spe
Externí odkaz:
http://arxiv.org/abs/2308.06125