Výsledky vyhledávání - "Kannan, Anjuli"

Report

Language model fusion for streaming end to end speech recognition

Autor: Cabrera, Rodrigo, Liu, Xiaofeng, Ghodsi, Mohammadreza, Matteson, Zebulun, Weinstein, Eugene, Kannan, Anjuli

Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long

Externí odkaz: http://arxiv.org/abs/2104.04487

Zobrazit plný text záznamu

Report

Language-agnostic Multilingual Modeling

Autor: Datta, Arindrima, Ramabhadran, Bhuvana, Emond, Jesse, Kannan, Anjuli, Roark, Brian

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarc

Externí odkaz: http://arxiv.org/abs/2004.09571

Zobrazit plný text záznamu

Report

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking.

Externí odkaz: http://arxiv.org/abs/2003.12710

Zobrazit plný text záznamu

Report

A comparison of end-to-end models for long-form speech recognition

Autor: Chiu, Chung-Cheng, Han, Wei, Zhang, Yu, Pang, Ruoming, Kishchenko, Sergey, Nguyen, Patrick, Narayanan, Arun, Liao, Hank, Zhang, Shuyuan, Kannan, Anjuli, Prabhavalkar, Rohit, Chen, Zhifeng, Sainath, Tara, Wu, Yonghui

End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused

Externí odkaz: http://arxiv.org/abs/1911.02242

Zobrazit plný text záznamu

Report

Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model

Autor: Kannan, Anjuli, Datta, Arindrima, Sainath, Tara N., Weinstein, Eugene, Ramabhadran, Bhuvana, Wu, Yonghui, Bapna, Ankur, Chen, Zhifeng, Lee, Seungji

Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by elim

Externí odkaz: http://arxiv.org/abs/1909.05330

Zobrazit plný text záznamu

Report

Extracting Symptoms and their Status from Clinical Conversations

Autor: Du, Nan, Chen, Kai, Kannan, Anjuli, Tran, Linh, Chen, Yuhui, Shafran, Izhak

Publikováno v: Proceedings of the Annual Meeting of the Association of Computational Linguistics, 2019

This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status. Lack of any publicly available corpus in this privacy-sensitive domain led us to develop ou

Externí odkaz: http://arxiv.org/abs/1906.02239

Zobrazit plný text záznamu

Report

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Autor: Shen, Jonathan, Nguyen, Patrick, Wu, Yonghui, Chen, Zhifeng, Chen, Mia X., Jia, Ye, Kannan, Anjuli, Sainath, Tara, Cao, Yuan, Chiu, Chung-Cheng, He, Yanzhang, Chorowski, Jan, Hinsu, Smit, Laurenzo, Stella, Qin, James, Firat, Orhan, Macherey, Wolfgang, Gupta, Suyog, Bapna, Ankur, Zhang, Shuyuan, Pang, Ruoming, Weiss, Ron J., Prabhavalkar, Rohit, Liang, Qiao, Jacob, Benoit, Liang, Bowen, Lee, HyoukJoong, Chelba, Ciprian, Jean, Sébastien, Li, Bo, Johnson, Melvin, Anil, Rohan, Tibrewal, Rajat, Liu, Xiaobing, Eriguchi, Akiko, Jaitly, Navdeep, Ari, Naveen, Cherry, Colin, Haghani, Parisa, Good, Otavio, Cheng, Youlong, Alvarez, Raziel, Caswell, Isaac, Hsu, Wei-Ning, Yang, Zongheng, Wang, Kuan-Chieh, Gonina, Ekaterina, Tomanek, Katrin, Vanik, Ben, Wu, Zelin, Jones, Llion, Schuster, Mike, Huang, Yanping, Chen, Dehao, Irie, Kazuki, Foster, George, Richardson, John, Macherey, Klaus, Bruguier, Antoine, Zen, Heiga, Raffel, Colin, Kumar, Shankar, Rao, Kanishka, Rybach, David, Murray, Matthew, Peddinti, Vijayaditya, Krikun, Maxim, Bacchiani, Michiel A. U., Jablin, Thomas B., Suderman, Rob, Williams, Ian, Lee, Benjamin, Bhatia, Deepti, Carlson, Justin, Yavuz, Semih, Zhang, Yu, McGraw, Ian, Galkin, Max, Ge, Qi, Pundak, Golan, Whipkey, Chad, Wang, Todd, Alon, Uri, Lepikhin, Dmitry, Tian, Ye, Sabour, Sara, Chan, William, Toshniwal, Shubham, Liao, Baohua, Nirschl, Michael, Rondon, Pat

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily ex

Externí odkaz: http://arxiv.org/abs/1902.08295

Zobrazit plný text záznamu

Report

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition

Autor: Irie, Kazuki, Prabhavalkar, Rohit, Kannan, Anjuli, Bruguier, Antoine, Rybach, David, Nguyen, Patrick

In conventional speech recognition, phoneme-based models outperform grapheme-based models for non-phonetic languages such as English. The performance gap between the two typically reduces as the amount of training data is increased. In this work, we

Externí odkaz: http://arxiv.org/abs/1902.01955

Zobrazit plný text záznamu

Report

Streaming End-to-end Speech Recognition For Mobile Devices

End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decod

Externí odkaz: http://arxiv.org/abs/1811.06621

Zobrazit plný text záznamu

Report

Deep context: end-to-end contextual speech recognition

Autor: Pundak, Golan, Sainath, Tara N., Prabhavalkar, Rohit, Kannan, Anjuli, Zhao, Ding

In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR sys- tem that util

Externí odkaz: http://arxiv.org/abs/1808.02480

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání