Výsledky vyhledávání - "Kida, Yusuke"

Report

Mask-CTC-based Encoder Pre-training for Streaming End-to-End Speech Recognition

Autor: Zhao, Huaibo, Higuchi, Yosuke, Kida, Yusuke, Ogawa, Tetsuji, Kobayashi, Tetsunori

Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latenc

Externí odkaz: http://arxiv.org/abs/2309.04654

Zobrazit plný text záznamu

Report

Neural Diarization with Non-autoregressive Intermediate Attractors

Autor: Fujita, Yusuke, Komatsu, Tatsuya, Scheibler, Robin, Kida, Yusuke, Ogawa, Tetsuji

End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speak

Externí odkaz: http://arxiv.org/abs/2303.06806

Zobrazit plný text záznamu

Report

Conversation-oriented ASR with multi-look-ahead CBS architecture

Autor: Zhao, Huaibo, Fujie, Shinya, Ogawa, Tetsuji, Sakuma, Jin, Kida, Yusuke, Kobayashi, Tetsunori

During conversations, humans are capable of inferring the intention of the speaker at any point of the speech to prepare the following action promptly. Such ability is also the key for conversational systems to achieve rhythmic and natural conversati

Externí odkaz: http://arxiv.org/abs/2211.00858

Zobrazit plný text záznamu

Report

Tourist Guidance Robot Based on HyperCLOVA

Autor: Yamazaki, Takato, Yoshikawa, Katsumasa, Kawamoto, Toshiki, Ohagi, Masaya, Mizumoto, Tomoya, Ichimura, Shuta, Kida, Yusuke, Sato, Toshinori

This paper describes our system submitted to Dialogue Robot Competition 2022. Our proposed system is a combined model of rule-based and generation-based dialog systems. The system utilizes HyperCLOVA, a Japanese foundation model, not only to generate

Externí odkaz: http://arxiv.org/abs/2210.10400

Zobrazit plný text záznamu

Report

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

Autor: Nakagome, Yu, Komatsu, Tatsuya, Fujita, Yusuke, Ichimura, Shuta, Kida, Yusuke

This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by condition

Externí odkaz: http://arxiv.org/abs/2204.00174

Zobrazit plný text záznamu

Report

Better Intermediates Improve CTC Inference

Autor: Komatsu, Tatsuya, Fujita, Yusuke, Lee, Jaesong, Lee, Lukas, Watanabe, Shinji, Kida, Yusuke

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and p

Externí odkaz: http://arxiv.org/abs/2204.00176

Zobrazit plný text záznamu

Report

Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR

Autor: Fujita, Yusuke, Komatsu, Tatsuya, Kida, Yusuke

End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many differe

Externí odkaz: http://arxiv.org/abs/2204.00175

Zobrazit plný text záznamu

Akademický článek

Flame acceleration and detonation initiation around a T-shaped bifurcation

Autor: Honda, Tomoaki, Ogawa, Syotaro, Kida, Yusuke, Kim, Wookyung, Johzaki, Tomoyuki, Yatsufusa, Tomoaki, Endo, Takuma

Publikováno v: In Journal of Loss Prevention in the Process Industries July 2024 89

Zobrazit plný text záznamu

Report

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

Autor: Kida, Yusuke, Komatsu, Tatsuya, Togami, Masahito

This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pair

Externí odkaz: http://arxiv.org/abs/2104.10328

Zobrazit plný text záznamu

Report

Voice Activity Detection: Merging Source and Filter-based Information

Autor: Drugman, Thomas, Stylianou, Yannis, Kida, Yusuke, Akamine, Masami

Publikováno v: IEEE Signal Processing Letters, Volume 23, Issue 2, pp. 252-256, 2015

Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from background noise. Numerous approaches have been proposed for this purpose. Some are based on features derived from the power spectral density, others exploit

Externí odkaz: http://arxiv.org/abs/1903.02844

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání