Zobrazeno 1 - 10
of 1 489
pro vyhledávání: '"Tanaka, Tomohiro"'
The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image di
Externí odkaz:
http://arxiv.org/abs/2309.01369
Autor:
Ashihara, Takanori, Moriya, Takafumi, Matsuura, Kohei, Tanaka, Tomohiro, Ijima, Yusuke, Asami, Taichi, Delcroix, Marc, Honma, Yukinori
Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken lan
Externí odkaz:
http://arxiv.org/abs/2306.08374
Autor:
Matsuura, Kohei, Ashihara, Takanori, Moriya, Takafumi, Tanaka, Tomohiro, Kano, Takatomo, Ogawa, Atsunori, Delcroix, Marc
End-to-end speech summarization (E2E SSum) directly summarizes input speech into easy-to-read short sentences with a single model. This approach is promising because it, in contrast to the conventional cascade approach, can utilize full acoustical in
Externí odkaz:
http://arxiv.org/abs/2306.04233
Autor:
Masumura, Ryo, Makishima, Naoki, Yamane, Taiga, Yamazaki, Yoshihiko, Mizuno, Saki, Ihori, Mana, Uchida, Mihiro, Suzuki, Keita, Sato, Hiroshi, Tanaka, Tomohiro, Takashima, Akihiko, Suzuki, Satoshi, Moriya, Takafumi, Hojo, Nobukatsu, Ando, Atsushi
This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are
Externí odkaz:
http://arxiv.org/abs/2306.02273
Autor:
Moriya, Takafumi, Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Ashihara, Takanori, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo, Ogawa, Atsunori, Asami, Taichi
Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture. It is a promising approach for streaming applications because it does not incur the extra computatio
Externí odkaz:
http://arxiv.org/abs/2305.15971
Autor:
Moriya, Takafumi, Ashihara, Takanori, Sato, Hiroshi, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo
The recurrent neural network-transducer (RNNT) is a promising approach for automatic speech recognition (ASR) with the introduction of a prediction network that autoregressively considers linguistic aspects. To train the autoregressive part, the grou
Externí odkaz:
http://arxiv.org/abs/2305.15958
Autor:
Sato, Hiroshi, Masumura, Ryo, Ochiai, Tsubasa, Delcroix, Marc, Moriya, Takafumi, Ashihara, Takanori, Shinayama, Kentaro, Mizuno, Saki, Ihori, Mana, Tanaka, Tomohiro, Hojo, Nobukatsu
Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding i
Externí odkaz:
http://arxiv.org/abs/2305.14723
Self-supervised learning (SSL) has been dramatically successful not only in monolingual but also in cross-lingual settings. However, since the two settings have been studied individually in general, there has been little research focusing on how effe
Externí odkaz:
http://arxiv.org/abs/2305.05201
Autor:
Matsuura, Kohei, Ashihara, Takanori, Moriya, Takafumi, Tanaka, Tomohiro, Ogawa, Atsunori, Delcroix, Marc, Masumura, Ryo
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech. Compared with the cascade approach, which combines automatic speech recognition (ASR) and text summarization models, the E2E approach is mor
Externí odkaz:
http://arxiv.org/abs/2303.00978
Autor:
Yoshihashi, Ryota, Nishimura, Shuhei, Yonebayashi, Dai, Otsuka, Yuya, Tanaka, Tomohiro, Miyazaki, Takashi
Siamese-network-based self-supervised learning (SSL) suffers from slow convergence and instability in training. To alleviate this, we propose a framework to exploit intermediate self-supervisions in each stage of deep nets, called the Ladder Siamese
Externí odkaz:
http://arxiv.org/abs/2211.13844