Výsledky vyhledávání - "Yasuda,Yusuke"

Report

Automatic design optimization of preference-based subjective evaluation with online learning in crowdsourcing environment

Autor: Yasuda, Yusuke, Toda, Tomoki

A preference-based subjective evaluation is a key method for evaluating generative media reliably. However, its huge combinations of pairs prohibit it from being applied to large-scale evaluation using crowdsourcing. To address this issue, we propose

Externí odkaz: http://arxiv.org/abs/2403.06100

Zobrazit plný text záznamu

Report

Preference-based training framework for automatic speech quality assessment using deep neural network

Autor: Hu, Cheng-Hung, Yasuda, Yusuke, Toda, Tomoki

One objective of Speech Quality Assessment (SQA) is to estimate the ranks of synthetic speech systems. However, recent SQA models are typically trained using low-precision direct scores such as mean opinion scores (MOS) as the training objective, whi

Externí odkaz: http://arxiv.org/abs/2308.15203

Zobrazit plný text záznamu

Report

Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder

Autor: Yasuda, Yusuke, Toda, Tomoki

Text-to-speech synthesis (TTS) is a task to convert texts into speech. Two of the factors that have been driving TTS are the advancements of probabilistic models and latent representation learning. We propose a TTS method based on latent variable con

Externí odkaz: http://arxiv.org/abs/2212.08329

Zobrazit plný text záznamu

Report

Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language

Autor: Yasuda, Yusuke, Toda, Tomoki

Publikováno v: IEEE Journal of Selected Topics in Signal Processing (Volume: 16, Issue: 6, October 2022)

End-to-end text-to-speech synthesis (TTS) can generate highly natural synthetic speech from raw text. However, rendering the correct pitch accents is still a challenging problem for end-to-end TTS. To tackle the challenge of rendering correct pitch a

Externí odkaz: http://arxiv.org/abs/2212.08321

Zobrazit plný text záznamu

Report

ESPnet2-TTS: Extending the Edge of TTS Research

Autor: Hayashi, Tomoki, Yamamoto, Ryuichi, Yoshimura, Takenori, Wu, Peter, Shi, Jiatong, Saeki, Takaaki, Ju, Yooncheol, Yasuda, Yusuke, Takamichi, Shinnosuke, Watanabe, Shinji

This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, an

Externí odkaz: http://arxiv.org/abs/2110.07840

Zobrazit plný text záznamu

Report

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

Autor: Cooper, Erica, Wang, Xin, Zhao, Yi, Yasuda, Yusuke, Yamagishi, Junichi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis. We also examine choice of neural vocoder for waveform synthesis, as well as acoustic config

Externí odkaz: http://arxiv.org/abs/2011.04839

Zobrazit plný text záznamu

Report

Autor: Kato, Shuhei, Yasuda, Yusuke, Wang, Xin, Cooper, Erica, Yamagishi, Junichi

We have been working on speech synthesis for rakugo (a traditional Japanese form of verbal entertainment similar to one-person stand-up comedy) toward speech synthesis that authentically entertains audiences. In this paper, we propose a novel evaluat

Externí odkaz: http://arxiv.org/abs/2010.11549

Zobrazit plný text záznamu

Report

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

Autor: Yasuda, Yusuke, Wang, Xin, Yamagishi, Junichi

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS). We propose a new TTS framework using explicit duration modeling that incorporates duration as a discrete latent variable to TTS and ena

Externí odkaz: http://arxiv.org/abs/2010.09602

Zobrazit plný text záznamu

Report

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

Autor: Yasuda, Yusuke, Wang, Xin, Yamagishi, Junichi

Neural sequence-to-sequence text-to-speech synthesis (TTS) can produce high-quality speech directly from text or simple linguistic features such as phonemes. Unlike traditional pipeline TTS, the neural sequence-to-sequence TTS does not require manual

Externí odkaz: http://arxiv.org/abs/2005.10390

Zobrazit plný text záznamu

Report

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

Autor: Cooper, Erica, Lai, Cheng-I, Yasuda, Yusuke, Yamagishi, Junichi

Previous work on speaker adaptation for end-to-end speech synthesis still falls short in speaker similarity. We investigate an orthogonal approach to the current speaker adaptation paradigms, speaker augmentation, by creating artificial speakers and

Externí odkaz: http://arxiv.org/abs/2005.01245

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání