Zobrazeno 1 - 10
of 59
pro vyhledávání: '"Michiel Bacchiani"'
Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, resp
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::806a1952bf9d1889dbb9e0babbc78004
http://arxiv.org/abs/2210.01029
http://arxiv.org/abs/2210.01029
Autor:
Michael L. Seltzer, Reinhold Haeb-Umbach, Shinji Watanabe, Bjorn Hoffmeister, Heiga Zen, Michiel Bacchiani, Mehrez Souden, Tomohiro Nakatani
Publikováno v:
IEEE Signal Processing Magazine. 36:111-124
Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital home assistants with a spoken language interface have become a ubiquitous commodity today. This success has been made possible by major advancements in si
End-to-end (E2E) modeling is advantageous for automatic speech recognition (ASR) especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E modeling is able to model character sequences directly. This paper focuses on t
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa3b42d54fba63daa5b94328fa765023
Autor:
Yotaro Kubo, Michiel Bacchiani
Publikováno v:
ICASSP
This paper proposes methods to improve a commonly used end-to-end speech recognition model, Listen-Attend-Spell (LAS). The methods we propose use multi-task learning to improve generalization of the model by leveraging information from multiple label
Autor:
Kevin W. Wilson, Kean Chin, Chanwoo Kim, Bo Li, Ananya Misra, Ron Weiss, Ehsan Variani, Andrew W. Senior, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani, Arun Narayanan
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 25:965-979
Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic mod
Autor:
Arun Narayanan, Galen Chuang, Zhongdi Qu, Rohit Prabhavalkar, Neeraj Gaur, Parisa Haghani, Pedro J. Moreno, Michiel Bacchiani, Austin Waters
Publikováno v:
SLT
Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hy
Autor:
Khe Chai Sim, Mohamed G. Elfeky, Trevor Strohman, Ananya Misra, Michiel Bacchiani, Arun Narayanan, Anshuman Tripathi, Golan Pundak, Parisa Haghani
Publikováno v:
SLT
Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers are used in conditions that do not match the training d
Autor:
Bo Li, Tara N. Sainath, Khe Chai Sim, Parisa Haghani, Anshuman Tripathi, Ananya Misra, Golan Pundak, Arun Narayanan, Michiel Bacchiani
Publikováno v:
INTERSPEECH
Publikováno v:
ICASSP
We present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference informati
Publikováno v:
ICASSP
This article introduces and evaluates Sampled Connectionist Temporal Classification (CTC) which connects the CTC criterion to the Cross Entropy (CE) objective through sampling. Instead of computing the logarithm of the sum of the alignment path likel