Zobrazeno 1 - 10
of 19
pro vyhledávání: '"Yatharth Saraf"'
Autor:
Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an orde
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d9d1740f72468c26efa5658360ae3194
http://arxiv.org/abs/2111.09296
http://arxiv.org/abs/2111.09296
Publikováno v:
ICASSP
Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input. In these cases, the visual stream provides complementary information and can often be leveraged in c
Autor:
Kjell Schubert, Chunxi Liu, Julian Chan, Jun Liu, Pradyot Prakash, Fuchun Peng, Geoffrey Zweig, Yatharth Saraf, Frank Zhang, Ching-Feng Yeh, Xiaohui Zhang
Publikováno v:
SLT
In this work, to measure the accuracy and efficiency for a latency-controlled streaming automatic speech recognition (ASR) application, we perform comprehensive evaluations on three popular training criteria: LF-MMI, CTC and RNN-T. In transcribing so
Publikováno v:
SLT
In this work, we exploit speech enhancement for improving a recurrent neural network transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent network (DCRN) for complex spectral mapping based speech enhancement, and find it help
Autor:
Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf
Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively an
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::142928f89bf038f8ad4bc4bfc5ddad63
Autor:
Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf
It is well known that many machine learning systems demonstrate bias towards specific groups of individuals. This problem has been studied extensively in the Facial Recognition area, but much less so in Automatic Speech Recognition (ASR). This paper
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::32a2faabe31f425418971a38c2cf4188
Autor:
Andros Tjandra, Diptanu Gon Choudhury, Frank Zhang, Kritika Singh, Alexis Conneau, Alexei Baevski, Assaf Sela, Yatharth Saraf, Michael Auli
Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech tasks. We
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::aede8a69be1d1d10628f5d34d47a2564
Autor:
Yatharth Saraf, Michael L. Seltzer, Suyoun Kim, Christian Fuegen, Duc Le, Yangyang Shi, Ozlem Kalinli, Julian Chan, Gil Keren, Yuan Shangguan, Mahaveer Jain, Jay Mahadeokar
How to leverage dynamic contextual information in end-to-end speech recognition has remained an active research area. Previous solutions to this problem were either designed for specialized use cases that did not generalize well to open-domain scenar
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6e25c8e6ece4b166bb1d6fa115444004
Autor:
Xiaohui Zhang, Vimal Manohar, David Zhang, Frank Zhang, Yangyang Shi, Nayan Singhal, Julian Chan, Fuchun Peng, Yatharth Saraf, Mike Seltzer
Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concep
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::861ab2dccc141561279d114c05090001
Autor:
Weiyi Zheng, Alex Xiao, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed
With 4.5 million hours of English speech from 10 different sources across 120 countries and models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech recognition. We propose data selection techniques to efficiently
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::75de3266af06b98ab25a97f6b1731a06