Zobrazeno 1 - 10
of 185
pro vyhledávání: '"Hori, Takaaki"'
In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. In the wake of this transit
Externí odkaz:
http://arxiv.org/abs/2303.03329
Autor:
Swietojanski, Pawel, Braun, Stefan, Can, Dogan, da Silva, Thiago Fraga, Ghoshal, Arnab, Hori, Takaaki, Hsiao, Roger, Mason, Henry, McDermott, Erik, Silovsky, Honza, Travadi, Ruchir, Zhuang, Xiaodan
Publikováno v:
International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing
This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, wher
Externí odkaz:
http://arxiv.org/abs/2211.01438
Graph-based temporal classification (GTC), a generalized form of the connectionist temporal classification loss, was recently proposed to improve automatic speech recognition (ASR) systems using graph-based supervision. For example, GTC was first use
Externí odkaz:
http://arxiv.org/abs/2203.00232
The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss us
Externí odkaz:
http://arxiv.org/abs/2111.01272
Autor:
Shah, Ankit P., Geng, Shijie, Gao, Peng, Cherian, Anoop, Hori, Takaaki, Marks, Tim K., Roux, Jonathan Le, Hori, Chiori
In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).
Externí odkaz:
http://arxiv.org/abs/2110.06894
Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR
Externí odkaz:
http://arxiv.org/abs/2110.04948
Video captioning is an essential technology to understand scenes and describe events in natural language. To apply it to real-time monitoring, a system needs not only to describe events accurately but also to produce the captions as soon as possible.
Externí odkaz:
http://arxiv.org/abs/2108.02147
Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging fo
Externí odkaz:
http://arxiv.org/abs/2107.01269
Pseudo-labeling (PL) has been shown to be effective in semi-supervised automatic speech recognition (ASR), where a base model is self-trained with pseudo-labels generated from unlabeled data. While PL can be further improved by iteratively updating p
Externí odkaz:
http://arxiv.org/abs/2106.08922