Zobrazeno 1 - 10
of 91
pro vyhledávání: '"Jasha Droppo"'
Publikováno v:
2022 IEEE Spoken Language Technology Workshop (SLT).
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Auto
Autor:
Arman Zharmagambetov, Qingming Tang, Chieh-Chi Kao, Qin Zhang, Ming Sun, Viktor Rozgic, Jasha Droppo, Chao Wang
Publikováno v:
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Modern speaker verification models use deep neural networks to encode utterance audio into discriminative embedding vectors. During the training process, these networks are typically optimized to differentiate arbitrary speakers. This learning proces
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3c65d26d8b49c5d1b0b0b12399e458c5
Autor:
Oguz H. Elibol, Jasha Droppo
Publikováno v:
Interspeech 2021.
There is a recent trend in machine learning to increase model quality by growing models to sizes previously thought to be unreasonable. Recent work has shown that autoregressive generative models with cross-entropy objective functions exhibit smooth
Publikováno v:
Interspeech 2021.
We propose a simple yet effective method to compress an RNN-Transducer (RNN-T) through the well-known knowledge distillation paradigm. We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acou
Publikováno v:
Interspeech 2021.
Publikováno v:
Interspeech 2021.
Text-to-speech systems recently achieved almost indistinguishable quality from human speech. However, the prosody of those systems is generally flatter than natural speech, producing samples with low expressiveness. Disentanglement of speaker id and
Publikováno v:
Interspeech 2021.
Autor:
Roland Maas, Jasha Droppo, Roberto Barra-Chicote, Yixiong Meng, Amin Fazel, Wei Yang, Yulan Liu
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated superior performance over the traditional hybrid ASR models. Training an E2E ASR model requires a large amount of data which is not only expensive but may also rais
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::7c7ab09241c6ea402622a7b43a7286d8
http://arxiv.org/abs/2106.07803
http://arxiv.org/abs/2106.07803
Autor:
Jasha Droppo, Peng Liu, Yuriy Mishchenko, Anish Shah, Roberto Barra Chicote, Jeff Condal, Andrew Werchniak
Publikováno v:
ICASSP
The study of keyword spotting, a subfield within the broader field of speech recognition that centers around identifying individual keywords in speech audio, has gained particular importance in recent years with the rise of personal voice assistants