Zobrazeno 1 - 10
of 181
pro vyhledávání: '"Wu, Minhua"'
Autor:
Wang, Jinhan, Chen, Long, Khare, Aparna, Raju, Anirudh, Dheram, Pranav, He, Di, Wu, Minhua, Stolcke, Andreas, Ravichandran, Venkatesh
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM). Experiments on the Switchboard human-human conversation dataset demon
Externí odkaz:
http://arxiv.org/abs/2401.14717
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Auto
Externí odkaz:
http://arxiv.org/abs/2210.12335
Autor:
Keskin, Gokce, Wu, Minhua, King, Brian, Mallidi, Harish, Gao, Yang, Droppo, Jasha, Rastrow, Ariya, Maas, Roland
Automatic speech recognition (ASR) models are typically designed to operate on a single input data type, e.g. a single or multi-channel audio streamed from a device. This design decision assumes the primary input data source does not change and if an
Externí odkaz:
http://arxiv.org/abs/2106.02750
Autor:
Ray, Swayambhu Nath, Wu, Minhua, Raju, Anirudh, Ghahremani, Pegah, Bilgi, Raghavendra, Rao, Milind, Arsikere, Harish, Rastrow, Ariya, Stolcke, Andreas, Droppo, Jasha
Publikováno v:
Proc. Interspeech, Sept. 2021, pp. 3455-3459
Comprehending the overall intent of an utterance helps a listener recognize the individual words spoken. Inspired by this fact, we perform a novel study of the impact of explicitly incorporating intent representations as additional information to imp
Externí odkaz:
http://arxiv.org/abs/2105.07071
Autor:
Pulugundla, Bhargav, Gao, Yang, King, Brian, Keskin, Gokce, Mallidi, Harish, Wu, Minhua, Droppo, Jasha, Maas, Roland
Attention-based beamformers have recently been shown to be effective for multi-channel speech recognition. However, they are less capable at capturing local information. In this work, we propose a 2D Conv-Attention module which combines convolution n
Externí odkaz:
http://arxiv.org/abs/2105.05920
Autor:
Sadhu, Samik, He, Di, Huang, Che-Wei, Mallidi, Sri Harish, Wu, Minhua, Rastrow, Ariya, Stolcke, Andreas, Droppo, Jasha, Maas, Roland
Wav2vec-C introduces a novel representation learning technique combining elements from wav2vec 2.0 and VQ-VAE. Our model learns to reproduce quantized representations from partially masked speech encoding using a contrastive loss in a way similar to
Externí odkaz:
http://arxiv.org/abs/2103.08393
Conventional speech enhancement technique such as beamforming has known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering laye
Externí odkaz:
http://arxiv.org/abs/2002.02520
In this work, we investigated the teacher-student training paradigm to train a fully learnable multi-channel acoustic model for far-field automatic speech recognition (ASR). Using a large offline teacher model trained on beamformed audio, we trained
Externí odkaz:
http://arxiv.org/abs/2002.00125
Recent literature has shown that a learned front end with multi-channel audio input can outperform traditional beam-forming algorithms for automatic speech recognition (ASR). In this paper, we present our study on multi-channel acoustic modeling usin
Externí odkaz:
http://arxiv.org/abs/2002.00122
Autor:
Xiang, Jing, Liu, Xueni, Hao, Yue, Zhu, Yanyan, Wu, Minhua, Lou, Jian, Wang, Yonghui, Xu, Chunwei, Xie, Yanru, Huang, Jianhui
Publikováno v:
In Translational Oncology December 2023 38