Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Jeremy H. M. Wong"'
Autor:
Jeremy H. M. Wong, Yifan Gong
Speakers may move around while diarisation is being performed. When a microphone array is used, the instantaneous locations of where the sounds originated from can be estimated, and previous investigations have shown that such information can be comp
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::19ba865ea5c8929f2d1f9e538135bffb
http://arxiv.org/abs/2109.11140
http://arxiv.org/abs/2109.11140
Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task. However, the models used often assume that speakers are fairly stationary throughout a meeting. This paper proposes
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::672205706c82576e385c218db988e8bf
http://arxiv.org/abs/2109.10598
http://arxiv.org/abs/2109.10598
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27:1725-1736
In automatic speech recognition, performance gains can often be obtained by combining an ensemble of multiple models. However, this can be computationally expensive when performing recognition. Teacher–student learning alleviates this cost by train
Autor:
Yifan Gong, George Polovets, Kenichi Kumatani, Eric Sun, Yashesh Gaur, Jinyu Li, Jeremy H. M. Wong, Partha Parthasarathy, Dimitrios Dimitriadis
Publikováno v:
ICASSP
Hypothesis-level combination between multiple models can often yield gains in speech recognition. However, all models in the ensemble are usually restricted to use the same audio segmentation times. This paper proposes to generalise hypothesis-level
Publikováno v:
ICASSP
Speaker diarisation methods often rely on speaker embeddings to cluster together the segments of audio that are uttered by the same speaker. When the audio is captured using a microphone array, it is possible to estimate the locations of where the so
Publikováno v:
INTERSPEECH
Recent studies suggest that it may now be possible to construct end-to-end Neural Network (NN) models that perform on-par with, or even outperform, hybrid models in speech recognition. These models differ in their designs, and as such, may exhibit di
Publikováno v:
ICASSP
While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue th
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::352d6256bec65f6266e5c03e13305066
http://arxiv.org/abs/2003.07482
http://arxiv.org/abs/2003.07482
Publikováno v:
ASRU
Teacher-student learning can be applied in automatic speech recognition for model compression and domain adaptation. This trains a student model to emulate the behaviour of a teacher model, and only the student is used to perform recognition. Dependi
Language modelling is a crucial component in a wide range of applications including speech recognition. Language models (LMs) are usually constructed by splitting a sentence into words and computing the probability of a word based on its word history
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1ece13c09e1e6fbf1664ab3756ef2ed8
Sequence Teacher-Student Training of Acoustic Models for Automatic Free Speaking Language Assessment
Publikováno v:
SLT
A high performance automatic speech recognition (ASR) system is an important constituent component of an automatic language assessment system for free speaking language tests. The ASR system is required to be capable of recognising non-native spontan