Výsledky vyhledávání - "Huang, W. Ronny"

Report

Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

Autor: Huang, W. Ronny, Allauzen, Cyril, Chen, Tongzhou, Gupta, Kilol, Hu, Ke, Qin, James, Zhang, Yu, Wang, Yongqiang, Chang, Shuo-Yiin, Sainath, Tara N.

In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck. We propose a non-autoregressive LM-fused ASR system that effectively leverages the parallelization capabilities of acceler

Externí odkaz: http://arxiv.org/abs/2401.12789

Zobrazit plný text záznamu

Report

Large-scale Language Model Rescoring on Long-form Data

Autor: Chen, Tongzhou, Allauzen, Cyril, Huang, Yinghui, Park, Daniel, Rybach, David, Huang, W. Ronny, Cabrera, Rodrigo, Audhkhasi, Kartik, Ramabhadran, Bhuvana, Moreno, Pedro J., Riley, Michael

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this work, we study the impact of Large-scale Language Models (LLM) on Automated Speech Recognition (ASR) of YouTube videos, which we use as a source for long-form ASR. We demonstrate up to 8\% relative reduction in Word Error Eate (WER) on US Eng

Externí odkaz: http://arxiv.org/abs/2306.08133

Zobrazit plný text záznamu

Report

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

Autor: Huang, W. Ronny, Zhang, Hao, Kumar, Shankar, Chang, Shuo-yiin, Sainath, Tara N.

We propose a method of segmenting long-form speech by separating semantically complete sentences within the utterance. This prevents the ASR decoder from needlessly processing faraway context while also preventing it from missing relevant context wit

Externí odkaz: http://arxiv.org/abs/2305.18419

Zobrazit plný text záznamu

Report

E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

Autor: Huang, W. Ronny, Chang, Shuo-Yiin, Sainath, Tara N., He, Yanzhang, Rybach, David, David, Robert, Prabhavalkar, Rohit, Allauzen, Cyril, Peyser, Cal, Strohman, Trevor D.

We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real

Externí odkaz: http://arxiv.org/abs/2211.15432

Zobrazit plný text záznamu

Report

Modular Hybrid Autoregressive Transducer

Autor: Meng, Zhong, Chen, Tongzhou, Prabhavalkar, Rohit, Zhang, Yu, Wang, Gary, Audhkhasi, Kartik, Emond, Jesse, Strohman, Trevor, Ramabhadran, Bhuvana, Huang, W. Ronny, Variani, Ehsan, Huang, Yinghui, Moreno, Pedro J.

Publikováno v: 2022 IEEE Spoken Language Technology Workshop (SLT), Doha, Qatar

Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition since the transducer has no clearly separated acoustic model (AM), language model (LM) or blank model. In this work, we propose a modular hybrid autoregr

Externí odkaz: http://arxiv.org/abs/2210.17049

Zobrazit plný text záznamu

Report

E2E Segmenter: Joint Segmenting and Decoding for Long-Form ASR

Autor: Huang, W. Ronny, Chang, Shuo-yiin, Rybach, David, Prabhavalkar, Rohit, Sainath, Tara N., Allauzen, Cyril, Peyser, Cal, Lu, Zhiyun

Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector

Externí odkaz: http://arxiv.org/abs/2204.10749

Zobrazit plný text záznamu

Report

Detecting Unintended Memorization in Language-Model-Fused ASR

Autor: Huang, W. Ronny, Chien, Steve, Thakkar, Om, Mathews, Rajiv

End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentio

Externí odkaz: http://arxiv.org/abs/2204.09606

Zobrazit plný text záznamu

Report

Sentence-Select: Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Autor: Huang, W. Ronny, Peyser, Cal, Sainath, Tara N., Pang, Ruoming, Strohman, Trevor, Kumar, Shankar

Language model fusion helps smart assistants recognize words which are rare in acoustic data but abundant in text-only corpora (typed search logs). However, such corpora have properties that hinder downstream performance, including being (1) too larg

Externí odkaz: http://arxiv.org/abs/2203.05008

Zobrazit plný text záznamu

Report

Capitalization Normalization for Language Modeling with an Accurate and Efficient Hierarchical RNN Model

Autor: Zhang, Hao, Cheng, You-Chi, Kumar, Shankar, Huang, W. Ronny, Chen, Mingqing, Mathews, Rajiv

Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text. We propose a fast, accurate and compact two-level hierarchical word-and-character-based recurrent neural network model. We use

Externí odkaz: http://arxiv.org/abs/2202.08171

Zobrazit plný text záznamu

Report

Scaling End-to-End Models for Large-Scale Multilingual ASR

Autor: Li, Bo, Pang, Ruoming, Sainath, Tara N., Gulati, Anmol, Zhang, Yu, Qin, James, Haghani, Parisa, Huang, W. Ronny, Ma, Min, Bai, Junwen

Building ASR models across many languages is a challenging multi-task learning problem due to large variations and heavily unbalanced data. Existing work has shown positive transfer from high resource to low resource languages. However, degradations

Externí odkaz: http://arxiv.org/abs/2104.14830

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání