Zobrazeno 1 - 10
of 32
pro vyhledávání: '"Puming Zhan"'
RNN-T models have gained popularity in the literature and in commercial systems because of their competitiveness and capability of operating in online streaming mode. In this work, we conduct an extensive study comparing several prediction network ar
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::24ef9ccd34444aad07fdef2196692c6b
http://arxiv.org/abs/2206.14618
http://arxiv.org/abs/2206.14618
Autor:
Felix Weninger, Marco Gaudesi, Md Akmal Haidar, Nicola Ferri, Jesús Andrés-Ferrer, Puming Zhan
In this paper, we present an in-depth study on online attention mechanisms and distillation techniques for dual-mode (i.e., joint online and offline) ASR using the Conformer Transducer. In the dual-mode Conformer Transducer model, layers can function
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::371f7689386302125eca948fa07a8ca8
http://arxiv.org/abs/2206.11157
http://arxiv.org/abs/2206.11157
End-2-end (E2E) models have become increasingly popular in some ASR tasks because of their performance and advantages. These E2E models directly approximate the posterior distribution of tokens given the acoustic inputs. Consequently, the E2E systems
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ff7cc7c9428c67eb8297349eab4f6df9
End-to-end (E2E) multi-channel ASR systems show state-of-the-art performance in far-field ASR tasks by joint training of a multi-channel front-end along with the ASR model. The main limitation of such systems is that they are usually trained with dat
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::17e0b2070855f9a30ed1a4d35b085708
Publikováno v:
INTERSPEECH
In this paper, we apply Semi-Supervised Learning (SSL) along with Data Augmentation (DA) for improving the accuracy of End-to-End ASR. We focus on the consistency regularization principle, which has been successfully applied to image classification t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4bc46f7beed87cdad035d8f2200424be
Publikováno v:
ASRU
Deep Neural Network (DNN) acoustic models are sensitive to the mismatch between training and testing environments. When a trained model is tested on unseen speakers, domain, or environment, recognition accuracy can degrade substantially. In such a ca
Publikováno v:
INTERSPEECH
Publikováno v:
INTERSPEECH
Sequence-to-sequence (seq2seq) based ASR systems have shown state-of-the-art performances while having clear advantages in terms of simplicity. However, comparisons are mostly done on speaker independent (SI) ASR systems, though speaker adapted conve
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6474c564fe6117b19bf26f724e24e3d3
http://arxiv.org/abs/1907.04916
http://arxiv.org/abs/1907.04916
Publikováno v:
ASRU
Use of both manually and automatically labelled data for model training is referred to as semi-supervised training. While semi-supervised acoustic model training has been well-explored in the context of hidden Markov Gaussian mixture models (HMM-GMMs
Publikováno v:
ICASSP
One of the objectives in acoustic modeling is to realize robust statistical models against the wide variety of acoustic conditions that are present in real world environments. As large amounts of training data become available, modeling subsets of th