Výsledky vyhledávání

Report

Text Injection for Neural Contextual Biasing

Autor: Meng, Zhong, Wu, Zelin, Prabhavalkar, Rohit, Peyser, Cal, Wang, Weiran, Chen, Nanxin, Sainath, Tara N., Ramabhadran, Bhuvana

Publikováno v: Interspeech 2024, Kos Island, Greece

Neural contextual biasing effectively improves automatic speech recognition (ASR) for crucial phrases within a speaker's context, particularly those that are infrequent in the training data. This work proposes contextual text injection (CTI) to enhan

Externí odkaz: http://arxiv.org/abs/2406.02921

Zobrazit plný text záznamu

Report

Improving Joint Speech-Text Representations Without Alignment

Autor: Peyser, Cal, Meng, Zhong, Hu, Ke, Prabhavalkar, Rohit, Rosenberg, Andrew, Sainath, Tara N., Picheny, Michael, Cho, Kyunghyun

Publikováno v: INTERSPEECH 2023

The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly. In ASR, this idea has found application as joint spe

Externí odkaz: http://arxiv.org/abs/2308.06125

Zobrazit plný text záznamu

Report

A Comparison of Semi-Supervised Learning Techniques for Streaming ASR at Scale

Autor: Peyser, Cal, Picheny, Michael, Cho, Kyunghyun, Prabhavalkar, Rohit, Huang, Ronny, Sainath, Tara

Publikováno v: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Unpaired text and audio injection have emerged as dominant methods for improving ASR performance in the absence of a large labeled corpus. However, little guidance exists on deploying these methods to improve production ASR systems that are trained o

Externí odkaz: http://arxiv.org/abs/2304.11053

Zobrazit plný text záznamu

Report

Dual Learning for Large Vocabulary On-Device ASR

Autor: Peyser, Cal, Huang, Ronny, Sainath, Tara, Prabhavalkar, Rohit, Picheny, Michael, Cho, Kyunghyun

Dual learning is a paradigm for semi-supervised machine learning that seeks to leverage unsupervised data by solving two opposite tasks at once. In this scheme, each model is used to generate pseudo-labels for unlabeled examples that are used to trai

Externí odkaz: http://arxiv.org/abs/2301.04327

Zobrazit plný text záznamu

Report

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

Autor: Huang, W. Ronny, Chang, Shuo-Yiin, Sainath, Tara N., He, Yanzhang, Rybach, David, David, Robert, Prabhavalkar, Rohit, Allauzen, Cyril, Peyser, Cal, Strohman, Trevor D.

We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real

Externí odkaz: http://arxiv.org/abs/2211.15432

Zobrazit plný text záznamu

Report

Towards Disentangled Speech Representations

Autor: Peyser, Cal, Sainath, Ronny Huang Andrew Rosenberg Tara N., Picheny, Michael, Cho, Kyunghyun

The careful construction of audio representations has become a dominant feature in the design of approaches to many speech tasks. Increasingly, such approaches have emphasized "disentanglement", where a representation contains only parts of the speec

Externí odkaz: http://arxiv.org/abs/2208.13191

Zobrazit plný text záznamu

Report

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Autor: Huang, W. Ronny, Chang, Shuo-yiin, Rybach, David, Prabhavalkar, Rohit, Sainath, Tara N., Allauzen, Cyril, Peyser, Cal, Lu, Zhiyun

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector

Externí odkaz: http://arxiv.org/abs/2204.10749

Zobrazit plný text záznamu

Report

Improving Rare Word Recognition with LM-aware MWER Training

Autor: Wang, Weiran, Chen, Tongzhou, Sainath, Tara N., Variani, Ehsan, Prabhavalkar, Rohit, Huang, Ronny, Ramabhadran, Bhuvana, Gaur, Neeraj, Mavandadi, Sepand, Peyser, Cal, Strohman, Trevor, He, Yanzhang, Rybach, David

Language models (LMs) significantly improve the recognition accuracy of end-to-end (E2E) models on words rarely seen during training, when used in either the shallow fusion or the rescoring setups. In this work, we introduce LMs in the learning of hy

Externí odkaz: http://arxiv.org/abs/2204.07553

Zobrazit plný text záznamu

Report

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Autor: Huang, W. Ronny, Peyser, Cal, Sainath, Tara N., Pang, Ruoming, Strohman, Trevor, Kumar, Shankar

Language model fusion helps smart assistants recognize words which are rare in acoustic data but abundant in text-only corpora (typed search logs). However, such corpora have properties that hinder downstream performance, including being (1) too larg

Externí odkaz: http://arxiv.org/abs/2203.05008

Zobrazit plný text záznamu

Report

Lookup-Table Recurrent Language Models for Long Tail Speech Recognition

Autor: Huang, W. Ronny, Sainath, Tara N., Peyser, Cal, Kumar, Shankar, Rybach, David, Strohman, Trevor

We introduce Lookup-Table Language Models (LookupLM), a method for scaling up the size of RNN language models with only a constant increase in the floating point operations, by increasing the expressivity of the embedding table. In particular, we ins

Externí odkaz: http://arxiv.org/abs/2104.04552

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání