Výsledky vyhledávání - "Hori, Takaaki"

Report

Autor: Prabhavalkar, Rohit, Hori, Takaaki, Sainath, Tara N., Schlüter, Ralf, Watanabe, Shinji

In the last decade of automatic speech recognition (ASR) research, the introduction of deep learning brought considerable reductions in word error rate of more than 50% relative, compared to modeling without deep learning. In the wake of this transit

Externí odkaz: http://arxiv.org/abs/2303.03329

Zobrazit plný text záznamu

Elektronická kniha

Speech recognition algorithms using weighted finite-state transducers [electronic resource] / Takaaki Hori, Atsushi Nakamura.

Autor: Hori, Takaaki

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on request.

Report

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Autor: Swietojanski, Pawel, Braun, Stefan, Can, Dogan, da Silva, Thiago Fraga, Ghoshal, Arnab, Hori, Takaaki, Hsiao, Roger, Mason, Henry, McDermott, Erik, Silovsky, Honza, Travadi, Ruchir, Zhuang, Xiaodan

Publikováno v: International Conference on Acoustics, Speech, and Signal Processing, 2023 International Conference on Acoustics, Speech, and Signal Processing International Conference on Acoustics, Speech, and Signal Processing

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, wher

Externí odkaz: http://arxiv.org/abs/2211.01438

Zobrazit plný text záznamu

Report

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

Autor: Chang, Xuankai, Moritz, Niko, Hori, Takaaki, Watanabe, Shinji, Roux, Jonathan Le

Graph-based temporal classification (GTC), a generalized form of the connectionist temporal classification loss, was recently proposed to improve automatic speech recognition (ASR) systems using graph-based supervision. For example, GTC was first use

Externí odkaz: http://arxiv.org/abs/2203.00232

Zobrazit plný text záznamu

Report

Sequence Transduction with Graph-based Supervision

Autor: Moritz, Niko, Hori, Takaaki, Watanabe, Shinji, Roux, Jonathan Le

The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss us

Externí odkaz: http://arxiv.org/abs/2111.01272

Zobrazit plný text záznamu

Report

Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning

Autor: Shah, Ankit P., Geng, Shijie, Gao, Peng, Cherian, Anoop, Hori, Takaaki, Marks, Tim K., Roux, Jonathan Le, Hori, Chiori

In previous work, we have proposed the Audio-Visual Scene-Aware Dialog (AVSD) task, collected an AVSD dataset, developed AVSD technologies, and hosted an AVSD challenge track at both the 7th and 8th Dialog System Technology Challenges (DSTC7, DSTC8).

Externí odkaz: http://arxiv.org/abs/2110.06894

Zobrazit plný text záznamu

Report

Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

Autor: Higuchi, Yosuke, Moritz, Niko, Roux, Jonathan Le, Hori, Takaaki

Pseudo-labeling (PL), a semi-supervised learning (SSL) method where a seed model performs self-training using pseudo-labels generated from untranscribed speech, has been shown to enhance the performance of end-to-end automatic speech recognition (ASR

Externí odkaz: http://arxiv.org/abs/2110.04948

Zobrazit plný text záznamu

Report

Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers

Autor: Hori, Chiori, Hori, Takaaki, Roux, Jonathan Le

Video captioning is an essential technology to understand scenes and describe events in natural language. To apply it to real-time monitoring, a system needs not only to describe events accurately but also to produce the captions as soon as possible.

Externí odkaz: http://arxiv.org/abs/2108.02147

Zobrazit plný text záznamu

Report

Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition

Autor: Moritz, Niko, Hori, Takaaki, Roux, Jonathan Le

Attention-based end-to-end automatic speech recognition (ASR) systems have recently demonstrated state-of-the-art results for numerous tasks. However, the application of self-attention and attention-based encoder-decoder models remains challenging fo

Externí odkaz: http://arxiv.org/abs/2107.01269

Zobrazit plný text záznamu

Report

Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition

Autor: Higuchi, Yosuke, Moritz, Niko, Roux, Jonathan Le, Hori, Takaaki

Pseudo-labeling (PL) has been shown to be effective in semi-supervised automatic speech recognition (ASR), where a base model is self-trained with pseudo-labels generated from unlabeled data. While PL can be further improved by iteratively updating p

Externí odkaz: http://arxiv.org/abs/2106.08922

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání