Výsledky vyhledávání - "Tanaka, Tomohiro"

Report

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

Autor: Yoshihashi, Ryota, Otsuka, Yuya, Doi, Kenji, Tanaka, Tomohiro, Kataoka, Hirokatsu

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image di

Externí odkaz: http://arxiv.org/abs/2309.01369

Zobrazit plný text záznamu

Report

SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?

Autor: Ashihara, Takanori, Moriya, Takafumi, Matsuura, Kohei, Tanaka, Tomohiro, Ijima, Yusuke, Asami, Taichi, Delcroix, Marc, Honma, Yukinori

Self-supervised learning (SSL) for speech representation has been successfully applied in various downstream tasks, such as speech and speaker recognition. More recently, speech SSL models have also been shown to be beneficial in advancing spoken lan

Externí odkaz: http://arxiv.org/abs/2306.08374

Zobrazit plný text záznamu

Report

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

Autor: Matsuura, Kohei, Ashihara, Takanori, Moriya, Takafumi, Tanaka, Tomohiro, Kano, Takatomo, Ogawa, Atsunori, Delcroix, Marc

End-to-end speech summarization (E2E SSum) directly summarizes input speech into easy-to-read short sentences with a single model. This approach is promising because it, in contrast to the conventional cascade approach, can utilize full acoustical in

Externí odkaz: http://arxiv.org/abs/2306.04233

Zobrazit plný text záznamu

Report

End-to-End Joint Target and Non-Target Speakers ASR

Autor: Masumura, Ryo, Makishima, Naoki, Yamane, Taiga, Yamazaki, Yoshihiko, Mizuno, Saki, Ihori, Mana, Uchida, Mihiro, Suzuki, Keita, Sato, Hiroshi, Tanaka, Tomohiro, Takashima, Akihiko, Suzuki, Satoshi, Moriya, Takafumi, Hojo, Nobukatsu, Ando, Atsushi

This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are

Externí odkaz: http://arxiv.org/abs/2306.02273

Zobrazit plný text záznamu

Report

Knowledge Distillation for Neural Transducer-based Target-Speaker ASR: Exploiting Parallel Mixture/Single-Talker Speech Data

Autor: Moriya, Takafumi, Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Ashihara, Takanori, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo, Ogawa, Atsunori, Asami, Taichi

Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture. It is a promising approach for streaming applications because it does not incur the extra computatio

Externí odkaz: http://arxiv.org/abs/2305.15971

Zobrazit plný text záznamu

Report

Improving Scheduled Sampling for Neural Transducer-based ASR

Autor: Moriya, Takafumi, Ashihara, Takanori, Sato, Hiroshi, Matsuura, Kohei, Tanaka, Tomohiro, Masumura, Ryo

The recurrent neural network-transducer (RNNT) is a promising approach for automatic speech recognition (ASR) with the introduction of a prediction network that autoregressively considers linguistic aspects. To train the autoregressive part, the grou

Externí odkaz: http://arxiv.org/abs/2305.15958

Zobrazit plný text záznamu

Report

Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss

Autor: Sato, Hiroshi, Masumura, Ryo, Ochiai, Tsubasa, Delcroix, Marc, Moriya, Takafumi, Ashihara, Takanori, Shinayama, Kentaro, Mizuno, Saki, Ihori, Mana, Tanaka, Tomohiro, Hojo, Nobukatsu

Self-supervised learning (SSL) is the latest breakthrough in speech processing, especially for label-scarce downstream tasks by leveraging massive unlabeled audio data. The noise robustness of the SSL is one of the important challenges to expanding i

Externí odkaz: http://arxiv.org/abs/2305.14723

Zobrazit plný text záznamu

Report

Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models

Autor: Ashihara, Takanori, Moriya, Takafumi, Matsuura, Kohei, Tanaka, Tomohiro

Self-supervised learning (SSL) has been dramatically successful not only in monolingual but also in cross-lingual settings. However, since the two settings have been studied individually in general, there has been little research focusing on how effe

Externí odkaz: http://arxiv.org/abs/2305.05201

Zobrazit plný text záznamu

Report

Leveraging Large Text Corpora for End-to-End Speech Summarization

Autor: Matsuura, Kohei, Ashihara, Takanori, Moriya, Takafumi, Tanaka, Tomohiro, Ogawa, Atsunori, Delcroix, Marc, Masumura, Ryo

End-to-end speech summarization (E2E SSum) is a technique to directly generate summary sentences from speech. Compared with the cascade approach, which combines automatic speech recognition (ASR) and text summarization models, the E2E approach is mor

Externí odkaz: http://arxiv.org/abs/2303.00978

Zobrazit plný text záznamu

Report

Ladder Siamese Network: a Method and Insights for Multi-level Self-Supervised Learning

Autor: Yoshihashi, Ryota, Nishimura, Shuhei, Yonebayashi, Dai, Otsuka, Yuya, Tanaka, Tomohiro, Miyazaki, Takashi

Siamese-network-based self-supervised learning (SSL) suffers from slow convergence and instability in training. To alleviate this, we propose a framework to exploit intermediate self-supervisions in each stage of deep nets, called the Ladder Siamese

Externí odkaz: http://arxiv.org/abs/2211.13844

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání