Výsledky vyhledávání - "Michiel Bacchiani"

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

Autor: Yuma Koizumi, Kohei Yatabe, Heiga Zen, Michiel Bacchiani

Denoising diffusion probabilistic models (DDPMs) and generative adversarial networks (GANs) are popular generative models for neural vocoders. The DDPMs and GANs can be characterized by the iterative denoising framework and adversarial training, resp

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::806a1952bf9d1889dbb9e0babbc78004
http://arxiv.org/abs/2210.01029

Zobrazit plný text záznamu

Speech Processing for Digital Home Assistants: Combining Signal Processing With Deep-Learning Techniques

Autor: Michael L. Seltzer, Reinhold Haeb-Umbach, Shinji Watanabe, Bjorn Hoffmeister, Heiga Zen, Michiel Bacchiani, Mehrez Souden, Tomohiro Nakatani

Publikováno v: IEEE Signal Processing Magazine. 36:111-124

Once a popular theme of futuristic science fiction or far-fetched technology forecasts, digital home assistants with a spoken language interface have become a ubiquitous commodity today. This success has been made possible by major advancements in si

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::00bb4532eb34327e539e5070f9743a94
https://doi.org/10.1109/msp.2019.2918706

Zobrazit plný text záznamu

A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition

Autor: Llion Jones, Michiel Bacchiani, Yotaro Kubo, Shigeki Karita

End-to-end (E2E) modeling is advantageous for automatic speech recognition (ASR) especially for Japanese since word-based tokenization of Japanese is not trivial, and E2E modeling is able to model character sequences directly. This paper focuses on t

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa3b42d54fba63daa5b94328fa765023

Zobrazit plný text záznamu

Joint Phoneme-Grapheme Model for End-To-End Speech Recognition

Autor: Yotaro Kubo, Michiel Bacchiani

Publikováno v: ICASSP

This paper proposes methods to improve a commonly used end-to-end speech recognition model, Listen-Attend-Spell (LAS). The methods we propose use multi-task learning to improve generalization of the model by leveraging information from multiple label

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::de172df60a54526dda191d7b10157318
https://doi.org/10.1109/icassp40776.2020.9054557

Zobrazit plný text záznamu

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Autor: Kevin W. Wilson, Kean Chin, Chanwoo Kim, Bo Li, Ananya Misra, Ron Weiss, Ehsan Variani, Andrew W. Senior, Tara N. Sainath, Izhak Shafran, Michiel Bacchiani, Arun Narayanan

Publikováno v: IEEE/ACM Transactions on Audio, Speech, and Language Processing. 25:965-979

Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic mod

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::76aebae59058bc33089ab1f3a1b12fcf
https://doi.org/10.1109/taslp.2017.2672401

Zobrazit plný text záznamu

From Audio to Semantics: Approaches to End-to-End Spoken Language Understanding

Autor: Arun Narayanan, Galen Chuang, Zhongdi Qu, Rohit Prabhavalkar, Neeraj Gaur, Parisa Haghani, Pedro J. Moreno, Michiel Bacchiani, Austin Waters

Publikováno v: SLT

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hy

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d1b2db66c60a737a66d9d88c906e0de7
https://doi.org/10.1109/slt.2018.8639043

Zobrazit plný text záznamu

Toward Domain-Invariant Speech Recognition via Large Scale Training

Autor: Khe Chai Sim, Mohamed G. Elfeky, Trevor Strohman, Ananya Misra, Michiel Bacchiani, Arun Narayanan, Anshuman Tripathi, Golan Pundak, Parisa Haghani

Publikováno v: SLT

Current state-of-the-art automatic speech recognition systems are trained to work in specific `domains', defined based on factors like application, sampling rate and codec. When such recognizers are used in conditions that do not match the training d

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a6f8761113c076ea769bce14469f25b5
https://doi.org/10.1109/slt.2018.8639610

Zobrazit plný text záznamu

Domain Adaptation Using Factorized Hidden Layer for Robust Automatic Speech Recognition

Autor: Bo Li, Tara N. Sainath, Khe Chai Sim, Parisa Haghani, Anshuman Tripathi, Ananya Misra, Golan Pundak, Arun Narayanan, Michiel Bacchiani

Publikováno v: INTERSPEECH

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::0e1980f15272c878d98b40c40ce369a4
https://doi.org/10.21437/interspeech.2018-2246

Zobrazit plný text záznamu

Sound Source Separation Using Phase Difference and Reliable Mask Selection Selection

Autor: Anjali Menon, Richard M. Stern, Michiel Bacchiani, Chanwoo Kim

Publikováno v: ICASSP

We present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference informati

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::bee40e87ece14c610381bd1b7d6fcc5e
https://doi.org/10.1109/icassp.2018.8462269

Zobrazit plný text záznamu

Sampled Connectionist Temporal Classification

Autor: Michiel Bacchiani, Tom Bagby, Kamel Lahouel, Erik McDermott, Ehsan Variani

Publikováno v: ICASSP

This article introduces and evaluates Sampled Connectionist Temporal Classification (CTC) which connects the CTC criterion to the Cross Entropy (CE) objective through sampling. Instead of computing the logarithm of the sum of the alignment path likel

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::82e53064794a8122ac2433f604b4c9d8
https://doi.org/10.1109/icassp.2018.8461929

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání