Zobrazeno 1 - 10
of 29
pro vyhledávání: '"Shuo-Yiin Chang"'
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Weiran Wang, Ding Zhao, Shaojin Ding, Hao Zhang, Shuo-Yiin Chang, David Rybach, Tara N. Sainath, Yanzhang He, Ian McGraw, Shankar Kumar
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Bo Li, Tara Sainath, Ruoming Pang, Shuo-Yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
On-device end-to-end (E2E) models have shown improvements over a conventional model on English Voice Search tasks in both quality and latency. E2E models have also shown promising results for multilingual automatic speech recognition (ASR). In this p
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e69d8dfd3d8f4389b06eed47e083c02c
http://arxiv.org/abs/2208.13916
http://arxiv.org/abs/2208.13916
Autor:
Shuo-Yiin Chang, Guru Prakash, Zelin Wu, Tara Sainath, Bo Li, Qiao Liang, Adam Stambler, Shyam Upadhyay, Manaal Faruqui, Trevor Strohman
In voice-enabled applications, a predetermined hotword isusually used to activate a device in order to attend to the query.However, speaking queries followed by a hotword each timeintroduces a cognitive burden in continued conversations. Toavoid repe
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::50678679e5680829604b4b3647342512
http://arxiv.org/abs/2208.13322
http://arxiv.org/abs/2208.13322
While a streaming voice assistant system has been used in many applications, this system typically focuses on unnatural, one-shot interactions assuming input from a single voice query without hesitation or disfluency. However, a common conversational
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::293b1951bb805466d8a74467c8f7893b
Autor:
Chao Zhang, Bo Li, Tara Sainath, Trevor Strohman, Sepand Mavandadi, Shuo-Yiin Chang, Parisa Haghani
Language identification is critical for many downstream tasks in automatic speech recognition (ASR), and is beneficial to integrate into multilingual end-to-end ASR as an additional task. In this paper, we propose to modify the structure of the casca
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::48bd4fe6333011053c8e723968f9842b
Autor:
W. Ronny Huang, Shuo-Yiin Chang, David Rybach, Tara Sainath, Rohit Prabhavalkar, Cal Peyser, Zhiyun Lu, Cyril Allauzen
Improving the performance of end-to-end ASR models on long utterances ranging from minutes to hours in length is an ongoing challenge in speech recognition. A common solution is to segment the audio in advance using a separate voice activity detector
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e9c15fa03f1db0e0b6fc4f4d4951f31a
Autor:
Wei Han, Tara N. Sainath, Bo Li, Yonghui Wu, Anmol Gulati, Arun Narayanan, Ruoming Pang, Shuo-Yiin Chang, Chung-Cheng Chiu, Yanzhang He, Jiahui Yu
Publikováno v:
ICASSP
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches
Autor:
Anmol Gulati, James Qin, Yonghui Wu, Yanzhang He, Yu Zhang, Tara N. Sainath, Trevor Strohman, Ruoming Pang, Arun Narayanan, Qiao Liang, Shuo-Yiin Chang, Chung-Cheng Chiu, Wei Han, Jiahui Yu, Bo Li
Publikováno v:
ICASSP
End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model