Zobrazeno 1 - 10
of 10
pro vyhledávání: '"Wong, Jeremy H. M"'
Speech evaluation measures a learners oral proficiency using automatic models. Corpora for training such models often pose sparsity challenges given that there often is limited scored data from teachers, in addition to the score distribution across p
Externí odkaz:
http://arxiv.org/abs/2409.14666
Autor:
Ritter-Gutierrez, Fabian, Huang, Kuan-Po, Wong, Jeremy H. M, Ng, Dianwen, Lee, Hung-yi, Chen, Nancy F., Chng, Eng Siong
Deep learning models for speech rely on large datasets, presenting computational challenges. Yet, performance hinges on training data size. Dataset Distillation (DD) aims to learn a smaller dataset without much performance degradation when training w
Externí odkaz:
http://arxiv.org/abs/2406.02963
Autor:
Ritter-Gutierrez, Fabian, Huang, Kuan-Po, Ng, Dianwen, Wong, Jeremy H. M., Lee, Hung-yi, Chng, Eng Siong, Chen, Nancy F.
Compared to large speech foundation models, small distilled models exhibit degraded noise robustness. The student's robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss
Externí odkaz:
http://arxiv.org/abs/2312.12153
The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This
Externí odkaz:
http://arxiv.org/abs/2306.02719
Autor:
Wong, Jeremy H. M., Gong, Yifan
Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be comp
Externí odkaz:
http://arxiv.org/abs/2109.11140
Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes
Externí odkaz:
http://arxiv.org/abs/2109.10598
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue th
Externí odkaz:
http://arxiv.org/abs/2003.07482
The standard Gaussian Process (GP) only considers a single output sample per input in the training set. Datasets for subjective tasks, such as spoken language assessment, may be annotated with output labels from multiple human raters per input. This
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9a1ef9cf378f6630278fc7d4a6539639
http://arxiv.org/abs/2306.02719
http://arxiv.org/abs/2306.02719
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.