Zobrazeno 1 - 10
of 4 491
pro vyhledávání: '"Sato, Hiroshi"'
Autor:
Sato, Hiroshi, Moriya, Takafumi, Mimura, Masato, Horiguchi, Shota, Ochiai, Tsubasa, Ashihara, Takanori, Ando, Atsushi, Shinayama, Kentaro, Delcroix, Marc
Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduc
Externí odkaz:
http://arxiv.org/abs/2407.01857
Autor:
Ochiai, Tsubasa, Iwamoto, Kazuma, Delcroix, Marc, Ikeshita, Rintaro, Sato, Hiroshi, Araki, Shoko, Katagiri, Shigeru
It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of
Externí odkaz:
http://arxiv.org/abs/2404.14860
Autor:
Fujita, Kenichi, Sato, Hiroshi, Ashihara, Takanori, Kanagawa, Hiroki, Delcroix, Marc, Moriya, Takafumi, Ijima, Yusuke
The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approach suffers
Externí odkaz:
http://arxiv.org/abs/2401.05111
Autor:
Iwamoto, Kazuma, Ochiai, Tsubasa, Delcroix, Marc, Ikeshita, Rintaro, Sato, Hiroshi, Araki, Shoko, Katagiri, Shigeru
Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2311.11599
Autor:
Fujino, Osamu, Sato, Hiroshi
If a toric foliation on a projective Q-factorial toric variety has an extremal ray whose length is longer than the rank of the foliation, then the associated extremal contraction is a projective space bundle and the foliation is the relative tangent
Externí odkaz:
http://arxiv.org/abs/2309.09461
Autor:
Masumura, Ryo, Makishima, Naoki, Yamane, Taiga, Yamazaki, Yoshihiko, Mizuno, Saki, Ihori, Mana, Uchida, Mihiro, Suzuki, Keita, Sato, Hiroshi, Tanaka, Tomohiro, Takashima, Akihiko, Suzuki, Satoshi, Moriya, Takafumi, Hojo, Nobukatsu, Ando, Atsushi
This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are
Externí odkaz:
http://arxiv.org/abs/2306.02273
Autor:
Moriya, Takafumi, Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Ashihara, Takanori, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo, Ogawa, Atsunori, Asami, Taichi
Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture. It is a promising approach for streaming applications because it does not incur the extra computatio
Externí odkaz:
http://arxiv.org/abs/2305.15971
Autor:
Moriya, Takafumi, Ashihara, Takanori, Sato, Hiroshi, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo
The recurrent neural network-transducer (RNNT) is a promising approach for automatic speech recognition (ASR) with the introduction of a prediction network that autoregressively considers linguistic aspects. To train the autoregressive part, the grou
Externí odkaz:
http://arxiv.org/abs/2305.15958
Autor:
Sato, Hiroshi, Masumura, Ryo, Ochiai, Tsubasa, Delcroix, Marc, Moriya, Takafumi, Ashihara, Takanori, Shinayama, Kentaro, Mizuno, Saki, Ihori, Mana, Tanaka, Tomohiro, Hojo, Nobukatsu
Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding i
Externí odkaz:
http://arxiv.org/abs/2305.14723