Výsledky vyhledávání - "P, Prabhavalkar"

Report

Text Injection for Neural Contextual Biasing

Autor: Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana

Publikováno v: Interspeech 2024, Kos Island, Greece

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan

Externí odkaz: http://arxiv.org/abs/2406.02921

Zobrazit plný text záznamu

Report

Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping

Autor: Wang, Lun, Thakkar, Om, Meng, Zhong, Rafidi, Nicole, Prabhavalkar, Rohit, Narayanan, Arun

Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memori

Externí odkaz: http://arxiv.org/abs/2406.02004

Zobrazit plný text záznamu

Report

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

Autor: Wu, Zelin, Song, Gan, Li, Christopher, Rondon, Pat, Meng, Zhong, Velez, Xavier, Wang, Weiran, Caseiro, Diamantino, Pundak, Golan, Munkhdalai, Tsendsuren, Chandorkar, Angad, Prabhavalkar, Rohit

Publikováno v: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for

Externí odkaz: http://arxiv.org/abs/2404.10180

Zobrazit plný text záznamu

Report

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

Autor: Prabhavalkar, Rohit, Meng, Zhong, Wang, Weiran, Stooke, Adam, Cai, Xingyu, He, Yanzhang, Narayanan, Arun, Hwang, Dongseong, Sainath, Tara N., Moreno, Pedro J.

The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires c

Externí odkaz: http://arxiv.org/abs/2402.17184

Zobrazit plný text záznamu

Report

Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers

Autor: Arumugam, Guru Prakash, Chang, Shuo-yiin, Sainath, Tara N., Prabhavalkar, Rohit, Wang, Quan, Bijwadia, Shaan

ASR models often suffer from a long-form deletion problem where the model predicts sequential blanks instead of words when transcribing a lengthy audio (in the order of minutes or hours). From the perspective of a user or downstream system consuming

Externí odkaz: http://arxiv.org/abs/2312.11123

Zobrazit plný text záznamu

Report

USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models

Autor: Ding, Shaojin, Qiu, David, Rim, David, He, Yanzhang, Rybakov, Oleg, Li, Bo, Prabhavalkar, Rohit, Wang, Weiran, Sainath, Tara N., Han, Zhonglin, Li, Jian, Yazdanbakhsh, Amir, Agrawal, Shivani

End-to-end automatic speech recognition (ASR) models have seen revolutionary quality gains with the recent development of large-scale universal speech models (USM). However, deploying these massive USMs is extremely expensive due to the enormous memo

Externí odkaz: http://arxiv.org/abs/2312.08553

Zobrazit plný text záznamu

Report

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Autor: Wang, Weiran, Wu, Zelin, Caseiro, Diamantino, Munkhdalai, Tsendsuren, Sim, Khe Chai, Rondon, Pat, Pundak, Golan, Song, Gan, Prabhavalkar, Rohit, Meng, Zhong, Zhao, Ding, Sainath, Tara, Mengibar, Pedro Moreno

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-

Externí odkaz: http://arxiv.org/abs/2310.00178

Zobrazit plný text záznamu

Report

The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

Autor: Zhou, Lillian, Ding, Yuxin, Chen, Mingqing, Zhang, Harry, Prabhavalkar, Rohit, Guliani, Dhruv, Motta, Giovanni, Mathews, Rajiv

Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but d

Externí odkaz: http://arxiv.org/abs/2310.00141

Zobrazit plný text záznamu

Report

Massive End-to-end Models for Short Search Queries

Autor: Wang, Weiran, Prabhavalkar, Rohit, Hwang, Dongseong, Li, Qiujia, Sim, Khe Chai, Li, Bo, Qin, James, Cai, Xingyu, Stooke, Adam, Meng, Zhong, Zheng, CJ, He, Yanzhang, Sainath, Tara, Mengibar, Pedro Moreno

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model paramet

Externí odkaz: http://arxiv.org/abs/2309.12963

Zobrazit plný text záznamu

Report

Improving Joint Speech-Text Representations Without Alignment

Autor: Peyser, Cal, Meng, Zhong, Hu, Ke, Prabhavalkar, Rohit, Rosenberg, Andrew, Sainath, Tara N., Picheny, Michael, Cho, Kyunghyun

Publikováno v: INTERSPEECH 2023

The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint spe

Externí odkaz: http://arxiv.org/abs/2308.06125

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání