Zobrazeno 1 - 10
of 64
pro vyhledávání: '"Ng, Tim"'
This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and a
Externí odkaz:
http://arxiv.org/abs/2408.13008
Autor:
Xu, Mingbin, Jin, Alex, Wang, Sicheng, Su, Mu, Ng, Tim, Mason, Henry, Han, Shiyi, Lei, Zhihong, Deng, Yaqiao, Huang, Zhen, Krishnamoorthy, Mahesh
With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still c
Externí odkaz:
http://arxiv.org/abs/2312.10359
Autor:
Alastruey, Belen, Sperber, Matthias, Gollan, Christian, Telaar, Dominic, Ng, Tim, Agarwal, Aashish
Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results f
Externí odkaz:
http://arxiv.org/abs/2310.12648
Autor:
Lei, Zhihong, Pusateri, Ernest, Han, Shiyi, Liu, Leo, Xu, Mingbin, Ng, Tim, Travadi, Ruchir, Zhang, Youyuan, Hannemann, Mirko, Siu, Man-Hung, Huang, Zhen
Recent advances in deep learning and automatic speech recognition have improved the accuracy of end-to-end speech recognition systems, but recognition of personal content such as contact names remains a challenge. In this work, we describe our person
Externí odkaz:
http://arxiv.org/abs/2310.09988
Autor:
Lei, Zhihong, Xu, Mingbin, Han, Shiyi, Liu, Leo, Huang, Zhen, Ng, Tim, Zhang, Yuanyuan, Pusateri, Ernest, Hannemann, Mirko, Deng, Yaqiao, Siu, Man-Hung
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model
Externí odkaz:
http://arxiv.org/abs/2310.07062
Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the desi
Externí odkaz:
http://arxiv.org/abs/2210.08918
The Listen, Attend and Spell (LAS) model and other attention-based automatic speech recognition (ASR) models have known limitations when operated in a fully online mode. In this paper, we analyze the online operation of LAS models to demonstrate that
Externí odkaz:
http://arxiv.org/abs/2008.05514
Very deep CNNs achieve state-of-the-art results in both computer vision and speech recognition, but are difficult to train. The most popular way to train very deep CNNs is to use shortcut connections (SC) together with batch normalization (BN). Inspi
Externí odkaz:
http://arxiv.org/abs/1910.01992
We start by considering binary words containing the minimum possible numbers of squares and antisquares (where an antisquare is a word of the form $x \overline{x}$), and we completely classify which possibilities can occur. We consider avoiding $x p(
Externí odkaz:
http://arxiv.org/abs/1904.09157
Autor:
Ng, Tim, author
Publikováno v:
Well-Being: Expanding the Definition of Progress : Insights From Practitioners, Researchers, and Innovators From Around the Globe, 2020, ill.
Externí odkaz:
https://doi.org/10.1093/oso/9780190080495.003.0009