Zobrazeno 1 - 10
of 30
pro vyhledávání: '"Sri Harish Mallidi"'
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 28:646-655
Attention-based methods and Connectionist Temporal Classification (CTC) network have been promising research directions for end-to-end (E2E) Automatic Speech Recognition (ASR). The joint CTC/Attention model has achieved great success by utilizing bot
Autor:
Roland Maas, Che-Wei Huang, Minhua Wu, Di He, Ariya Rastrow, Jasha Droppo, Samik Sadhu, Sri Harish Mallidi, Andreas Stolcke
Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3a5fd8d5db1f3b4bb1efe3a0622a42d9
Autor:
Xiaosu Tong, Sri Harish Mallidi, Che-Wei Huang, Chander Chandak, Ariya Rastrow, Sonal Pareek, Roland Maas, Shaun N. Joseph
Publikováno v:
SLT
In this paper, we propose a streaming model to distinguish voice queries intended for a smart-home device from background speech. The proposed model consists of multiple CNN layers with residual connections, followed by a stacked LSTM architecture. T
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::715d4896aab69f666461f5b60105f442
http://arxiv.org/abs/2007.09245
http://arxiv.org/abs/2007.09245
Publikováno v:
INTERSPEECH
Autor:
Bjorn Hoffmeister, Kyle Goehner, Roland Maas, Ariya Rastrow, Sri Harish Mallidi, Spyros Matsoukas
Publikováno v:
INTERSPEECH
In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants. Applications include rejection of false wake-ups or unintended interactions as well as enabl
Autor:
Matthew Wiesner, Martin Karafiat, Nelson Yalta, Shinji Watanabe, Takaaki Hori, Sri Harish Mallidi, Murali Karthick Baskar, Ruizhi Li, Jaejin Cho
Publikováno v:
SLT
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new direction in speech research. The approach benefits by performing model training without using lexicon and alignments. However, this poses a new problem of requiring mor
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3f69fc4fd11875ffbe485744ed625fb2
Publikováno v:
ICASSP
Automatic Speech Recognition (ASR) using multiple microphone arrays has achieved great success in the far-field robustness. Taking advantage of all the information that each array shares and contributes is crucial in this task. Motivated by the advan
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2f629e37df5a27c41c31c7d8fc0c71bb
Publikováno v:
ICASSP
In this paper we investigate methods to predict word error rates in automatic speech recognition in the presence of unknown noise types, which have not been seen during training. The performance measures operate on phoneme posteriorgrams that are obt
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 22:1285-1295
Speaker and language recognition in noisy and degraded channel conditions continue to be a challenging problem mainly due to the mismatch between clean training and noisy test conditions. In the presence of noise, the most reliable portions of the si