Výsledky vyhledávání - "Sato, Hiroshi"

Report

SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling

Autor: Sato, Hiroshi, Moriya, Takafumi, Mimura, Masato, Horiguchi, Shota, Ochiai, Tsubasa, Ashihara, Takanori, Ando, Atsushi, Shinayama, Kentaro, Delcroix, Marc

Real-time target speaker extraction (TSE) is intended to extract the desired speaker's voice from the observed mixture of multiple speakers in a streaming manner. Implementing real-time TSE is challenging as the computational complexity must be reduc

Externí odkaz: http://arxiv.org/abs/2407.01857

Zobrazit plný text záznamu

Report

Rethinking Processing Distortions: Disentangling the Impact of Speech Enhancement Errors on Speech Recognition Performance

Autor: Ochiai, Tsubasa, Iwamoto, Kazuma, Delcroix, Marc, Ikeshita, Rintaro, Sato, Hiroshi, Araki, Shoko, Katagiri, Shigeru

It is challenging to improve automatic speech recognition (ASR) performance in noisy conditions with a single-channel speech enhancement (SE) front-end. This is generally attributed to the processing distortions caused by the nonlinear processing of

Externí odkaz: http://arxiv.org/abs/2404.14860

Zobrazit plný text záznamu

Report

Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters

Autor: Fujita, Kenichi, Sato, Hiroshi, Ashihara, Takanori, Kanagawa, Hiroki, Delcroix, Marc, Moriya, Takafumi, Ijima, Yusuke

The zero-shot text-to-speech (TTS) method, based on speaker embeddings extracted from reference speech using self-supervised learning (SSL) speech representations, can reproduce speaker characteristics very accurately. However, this approach suffers

Externí odkaz: http://arxiv.org/abs/2401.05111

Zobrazit plný text záznamu

Report

How does end-to-end speech recognition training impact speech enhancement artifacts?

Autor: Iwamoto, Kazuma, Ochiai, Tsubasa, Delcroix, Marc, Ikeshita, Rintaro, Sato, Hiroshi, Araki, Shoko, Katagiri, Shigeru

Jointly training a speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end has been investigated as a way to mitigate the influence of \emph{processing distortion} generated by single-channel SE on ASR. In this paper, we

Externí odkaz: http://arxiv.org/abs/2311.11599

Zobrazit plný text záznamu

Report

A remark on toric foliations

Autor: Fujino, Osamu, Sato, Hiroshi

If a toric foliation on a projective Q-factorial toric variety has an extremal ray whose length is longer than the rank of the foliation, then the associated extremal contraction is a projective space bundle and the foliation is the relative tangent

Externí odkaz: http://arxiv.org/abs/2309.09461

Zobrazit plný text záznamu

Report

End-to-End Joint Target and Non-Target Speakers ASR

Autor: Masumura, Ryo, Makishima, Naoki, Yamane, Taiga, Yamazaki, Yoshihiko, Mizuno, Saki, Ihori, Mana, Uchida, Mihiro, Suzuki, Keita, Sato, Hiroshi, Tanaka, Tomohiro, Takashima, Akihiko, Suzuki, Satoshi, Moriya, Takafumi, Hojo, Nobukatsu, Ando, Atsushi

This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are

Externí odkaz: http://arxiv.org/abs/2306.02273

Zobrazit plný text záznamu

Report

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Autor: Moriya, Takafumi, Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Ashihara, Takanori, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo, Ogawa, Atsunori, Asami, Taichi

Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture. It is a promising approach for streaming applications because it does not incur the extra computatio

Externí odkaz: http://arxiv.org/abs/2305.15971

Zobrazit plný text záznamu

Report

Improving Scheduled Sampling for Neural Transducer-based ASR

Autor: Moriya, Takafumi, Ashihara, Takanori, Sato, Hiroshi, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo

The recurrent neural network-transducer (RNNT) is a promising approach for automatic speech recognition (ASR) with the introduction of a prediction network that autoregressively considers linguistic aspects. To train the autoregressive part, the grou

Externí odkaz: http://arxiv.org/abs/2305.15958

Zobrazit plný text záznamu

Report

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Autor: Sato, Hiroshi, Masumura, Ryo, Ochiai, Tsubasa, Delcroix, Marc, Moriya, Takafumi, Ashihara, Takanori, Shinayama, Kentaro, Mizuno, Saki, Ihori, Mana, Tanaka, Tomohiro, Hojo, Nobukatsu

Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding i

Externí odkaz: http://arxiv.org/abs/2305.14723

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání