Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Kida, Yusuke"'
Achieving high accuracy with low latency has always been a challenge in streaming end-to-end automatic speech recognition (ASR) systems. By attending to more future contexts, a streaming ASR model achieves higher accuracy but results in larger latenc
Externí odkaz:
http://arxiv.org/abs/2309.04654
End-to-end neural diarization (EEND) with encoder-decoder-based attractors (EDA) is a promising method to handle the whole speaker diarization problem simultaneously with a single neural network. While the EEND model can produce all frame-level speak
Externí odkaz:
http://arxiv.org/abs/2303.06806
During conversations, humans are capable of inferring the intention of the speaker at any point of the speech to prepare the following action promptly. Such ability is also the key for conversational systems to achieve rhythmic and natural conversati
Externí odkaz:
http://arxiv.org/abs/2211.00858
Autor:
Yamazaki, Takato, Yoshikawa, Katsumasa, Kawamoto, Toshiki, Ohagi, Masaya, Mizumoto, Tomoya, Ichimura, Shuta, Kida, Yusuke, Sato, Toshinori
This paper describes our system submitted to Dialogue Robot Competition 2022. Our proposed system is a combined model of rule-based and generation-based dialog systems. The system utilizes HyperCLOVA, a Japanese foundation model, not only to generate
Externí odkaz:
http://arxiv.org/abs/2210.10400
This paper proposes InterAug: a novel training method for CTC-based ASR using augmented intermediate representations for conditioning. The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by condition
Externí odkaz:
http://arxiv.org/abs/2204.00174
This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning. The paper first formulates self-conditioned CTC as a probabilistic model with an intermediate prediction as a latent representation and p
Externí odkaz:
http://arxiv.org/abs/2204.00176
Alternate Intermediate Conditioning with Syllable-level and Character-level Targets for Japanese ASR
End-to-end automatic speech recognition directly maps input speech to characters. However, the mapping can be problematic when several different pronunciations should be mapped into one character or when one pronunciation is shared among many differe
Externí odkaz:
http://arxiv.org/abs/2204.00175
Autor:
Honda, Tomoaki, Ogawa, Syotaro, Kida, Yusuke, Kim, Wookyung, Johzaki, Tomoyuki, Yatsufusa, Tomoaki, Endo, Takuma
Publikováno v:
In Journal of Loss Prevention in the Process Industries July 2024 89
This paper proposes a novel label-synchronous speech-to-text alignment technique for automatic speech recognition (ASR). The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pair
Externí odkaz:
http://arxiv.org/abs/2104.10328
Publikováno v:
IEEE Signal Processing Letters, Volume 23, Issue 2, pp. 252-256, 2015
Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from background noise. Numerous approaches have been proposed for this purpose. Some are based on features derived from the power spectral density, others exploit
Externí odkaz:
http://arxiv.org/abs/1903.02844