Výsledky vyhledávání

Report

Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions

Autor: Baskar, Murali Karthick, Rosenberg, Andrew, Ramabhadran, Bhuvana, Gaur, Neeraj, Meng, Zhong

In this paper, we focus on addressing the constraints faced when applying LLMs to ASR. Recent works utilize prefixLM-type models, which directly apply speech as a prefix to LLMs for ASR. We have found that optimizing speech prefixes leads to better A

Externí odkaz: http://arxiv.org/abs/2406.14701

Zobrazit plný text záznamu

Report

Text Injection for Neural Contextual Biasing

Autor: Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana

Publikováno v: Interspeech 2024, Kos Island, Greece

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan

Externí odkaz: http://arxiv.org/abs/2406.02921

Zobrazit plný text záznamu

Report

Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping

Autor: Wang, Lun, Thakkar, Om, Meng, Zhong, Rafidi, Nicole, Prabhavalkar, Rohit, Narayanan, Arun

Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memori

Externí odkaz: http://arxiv.org/abs/2406.02004

Zobrazit plný text záznamu

Report

Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR

Autor: Wu, Zelin, Song, Gan, Li, Christopher, Rondon, Pat, Meng, Zhong, Velez, Xavier, Wang, Weiran, Caseiro, Diamantino, Pundak, Golan, Munkhdalai, Tsendsuren, Chandorkar, Angad, Prabhavalkar, Rohit

Publikováno v: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

Contextual biasing enables speech recognizers to transcribe important phrases in the speaker's context, such as contact names, even if they are rare in, or absent from, the training data. Attention-based biasing is a leading approach which allows for

Externí odkaz: http://arxiv.org/abs/2404.10180

Zobrazit plný text záznamu

Report

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

Autor: Prabhavalkar, Rohit, Meng, Zhong, Wang, Weiran, Stooke, Adam, Cai, Xingyu, He, Yanzhang, Narayanan, Arun, Hwang, Dongseong, Sainath, Tara N., Moreno, Pedro J.

The accuracy of end-to-end (E2E) automatic speech recognition (ASR) models continues to improve as they are scaled to larger sizes, with some now reaching billions of parameters. Widespread deployment and adoption of these models, however, requires c

Externí odkaz: http://arxiv.org/abs/2402.17184

Zobrazit plný text záznamu

Report

SLM: Bridge the thin gap between speech and text foundation models

Autor: Wang, Mingqiu, Han, Wei, Shafran, Izhak, Wu, Zelin, Chiu, Chung-Cheng, Cao, Yuan, Wang, Yongqiang, Chen, Nanxin, Zhang, Yu, Soltau, Hagen, Rubenstein, Paul, Zilka, Lukas, Yu, Dian, Meng, Zhong, Pundak, Golan, Siddhartha, Nikhil, Schalkwyk, Johan, Wu, Yonghui

We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models. SLM freezes the pretrained foundation models to maximally preserves their

Externí odkaz: http://arxiv.org/abs/2310.00230

Zobrazit plný text záznamu

Report

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

Autor: Wang, Weiran, Wu, Zelin, Caseiro, Diamantino, Munkhdalai, Tsendsuren, Sim, Khe Chai, Rondon, Pat, Pundak, Golan, Song, Gan, Prabhavalkar, Rohit, Meng, Zhong, Zhao, Ding, Sainath, Tara, Mengibar, Pedro Moreno

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios. We propose algorithms for contextual biasing based on the Knuth-

Externí odkaz: http://arxiv.org/abs/2310.00178

Zobrazit plný text záznamu

Report

Massive End-to-end Models for Short Search Queries

Autor: Wang, Weiran, Prabhavalkar, Rohit, Hwang, Dongseong, Li, Qiujia, Sim, Khe Chai, Li, Bo, Qin, James, Cai, Xingyu, Stooke, Adam, Meng, Zhong, Zheng, CJ, He, Yanzhang, Sainath, Tara, Mengibar, Pedro Moreno

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model paramet

Externí odkaz: http://arxiv.org/abs/2309.12963

Zobrazit plný text záznamu

Report

Augmenting conformers with structured state-space sequence models for online speech recognition

Autor: Shan, Haozhe, Gu, Albert, Meng, Zhong, Wang, Weiran, Choromanski, Krzysztof, Sainath, Tara

Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space seq

Externí odkaz: http://arxiv.org/abs/2309.08551

Zobrazit plný text záznamu

Report

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

Autor: Bijwadia, Shaan, Chang, Shuo-yiin, Wang, Weiran, Meng, Zhong, Zhang, Hao, Sainath, Tara N.

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tas

Externí odkaz: http://arxiv.org/abs/2308.07395

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání