Zobrazeno 1 - 10
of 31
pro vyhledávání: '"Kannan, Anjuli"'
Autor:
Cabrera, Rodrigo, Liu, Xiaofeng, Ghodsi, Mohammadreza, Matteson, Zebulun, Weinstein, Eugene, Kannan, Anjuli
Streaming processing of speech audio is required for many contemporary practical speech recognition tasks. Even with the large corpora of manually transcribed speech data available today, it is impossible for such corpora to cover adequately the long
Externí odkaz:
http://arxiv.org/abs/2104.04487
Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of data-rich and data-scarce languages in a single model. This enables data and parameter sharing across languages, which is especially beneficial for the data-scarc
Externí odkaz:
http://arxiv.org/abs/2004.09571
A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency
Autor:
Sainath, Tara N., He, Yanzhang, Li, Bo, Narayanan, Arun, Pang, Ruoming, Bruguier, Antoine, Chang, Shuo-yiin, Li, Wei, Alvarez, Raziel, Chen, Zhifeng, Chiu, Chung-Cheng, Garcia, David, Gruenstein, Alex, Hu, Ke, Jin, Minho, Kannan, Anjuli, Liang, Qiao, McGraw, Ian, Peyser, Cal, Prabhavalkar, Rohit, Pundak, Golan, Rybach, David, Shangguan, Yuan, Sheth, Yash, Strohman, Trevor, Visontai, Mirko, Wu, Yonghui, Zhang, Yu, Zhao, Ding
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i.e., word error rate (WER), and latency, i.e., the time the hypothesis is finalized after the user stops speaking.
Externí odkaz:
http://arxiv.org/abs/2003.12710
Autor:
Chiu, Chung-Cheng, Han, Wei, Zhang, Yu, Pang, Ruoming, Kishchenko, Sergey, Nguyen, Patrick, Narayanan, Arun, Liao, Hank, Zhang, Shuyuan, Kannan, Anjuli, Prabhavalkar, Rohit, Chen, Zhifeng, Sainath, Tara, Wu, Yonghui
End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused
Externí odkaz:
http://arxiv.org/abs/1911.02242
Autor:
Kannan, Anjuli, Datta, Arindrima, Sainath, Tara N., Weinstein, Eugene, Ramabhadran, Bhuvana, Wu, Yonghui, Bapna, Ankur, Chen, Zhifeng, Lee, Seungji
Multilingual end-to-end (E2E) models have shown great promise in expansion of automatic speech recognition (ASR) coverage of the world's languages. They have shown improvement over monolingual systems, and have simplified training and serving by elim
Externí odkaz:
http://arxiv.org/abs/1909.05330
Publikováno v:
Proceedings of the Annual Meeting of the Association of Computational Linguistics, 2019
This paper describes novel models tailored for a new application, that of extracting the symptoms mentioned in clinical conversations along with their status. Lack of any publicly available corpus in this privacy-sensitive domain led us to develop ou
Externí odkaz:
http://arxiv.org/abs/1906.02239
Autor:
Shen, Jonathan, Nguyen, Patrick, Wu, Yonghui, Chen, Zhifeng, Chen, Mia X., Jia, Ye, Kannan, Anjuli, Sainath, Tara, Cao, Yuan, Chiu, Chung-Cheng, He, Yanzhang, Chorowski, Jan, Hinsu, Smit, Laurenzo, Stella, Qin, James, Firat, Orhan, Macherey, Wolfgang, Gupta, Suyog, Bapna, Ankur, Zhang, Shuyuan, Pang, Ruoming, Weiss, Ron J., Prabhavalkar, Rohit, Liang, Qiao, Jacob, Benoit, Liang, Bowen, Lee, HyoukJoong, Chelba, Ciprian, Jean, Sébastien, Li, Bo, Johnson, Melvin, Anil, Rohan, Tibrewal, Rajat, Liu, Xiaobing, Eriguchi, Akiko, Jaitly, Navdeep, Ari, Naveen, Cherry, Colin, Haghani, Parisa, Good, Otavio, Cheng, Youlong, Alvarez, Raziel, Caswell, Isaac, Hsu, Wei-Ning, Yang, Zongheng, Wang, Kuan-Chieh, Gonina, Ekaterina, Tomanek, Katrin, Vanik, Ben, Wu, Zelin, Jones, Llion, Schuster, Mike, Huang, Yanping, Chen, Dehao, Irie, Kazuki, Foster, George, Richardson, John, Macherey, Klaus, Bruguier, Antoine, Zen, Heiga, Raffel, Colin, Kumar, Shankar, Rao, Kanishka, Rybach, David, Murray, Matthew, Peddinti, Vijayaditya, Krikun, Maxim, Bacchiani, Michiel A. U., Jablin, Thomas B., Suderman, Rob, Williams, Ian, Lee, Benjamin, Bhatia, Deepti, Carlson, Justin, Yavuz, Semih, Zhang, Yu, McGraw, Ian, Galkin, Max, Ge, Qi, Pundak, Golan, Whipkey, Chad, Wang, Todd, Alon, Uri, Lepikhin, Dmitry, Tian, Ye, Sabour, Sara, Chan, William, Toshniwal, Shubham, Liao, Baohua, Nirschl, Michael, Rondon, Pat
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily ex
Externí odkaz:
http://arxiv.org/abs/1902.08295
Autor:
Irie, Kazuki, Prabhavalkar, Rohit, Kannan, Anjuli, Bruguier, Antoine, Rybach, David, Nguyen, Patrick
In conventional speech recognition, phoneme-based models outperform grapheme-based models for non-phonetic languages such as English. The performance gap between the two typically reduces as the amount of training data is increased. In this work, we
Externí odkaz:
http://arxiv.org/abs/1902.01955
Autor:
He, Yanzhang, Sainath, Tara N., Prabhavalkar, Rohit, McGraw, Ian, Alvarez, Raziel, Zhao, Ding, Rybach, David, Kannan, Anjuli, Wu, Yonghui, Pang, Ruoming, Liang, Qiao, Bhatia, Deepti, Shangguan, Yuan, Li, Bo, Pundak, Golan, Sim, Khe Chai, Bagby, Tom, Chang, Shuo-yiin, Rao, Kanishka, Gruenstein, Alexander
End-to-end (E2E) models, which directly predict output character sequences given input speech, are good candidates for on-device speech recognition. E2E models, however, present numerous challenges: In order to be truly useful, such models must decod
Externí odkaz:
http://arxiv.org/abs/1811.06621
In automatic speech recognition (ASR) what a user says depends on the particular context she is in. Typically, this context is represented as a set of word n-grams. In this work, we present a novel, all-neural, end-to-end (E2E) ASR sys- tem that util
Externí odkaz:
http://arxiv.org/abs/1808.02480