Výsledky vyhledávání - "Shirahata, Yuma"

Report

Description-based Controllable Text-to-Speech with Cross-Lingual Voice Control

Autor: Yamamoto, Ryuichi, Shirahata, Yuma, Kawamura, Masaya, Tachibana, Kentaro

We propose a novel description-based controllable text-to-speech (TTS) method with cross-lingual control capability. To address the lack of audio-description paired data in the target language, we combine a TTS model trained on the target language wi

Externí odkaz: http://arxiv.org/abs/2409.17452

Zobrazit plný text záznamu

Report

Universal Score-based Speech Enhancement with High Content Preservation

Autor: Scheibler, Robin, Fujita, Yusuke, Shirahata, Yuma, Komatsu, Tatsuya

We propose UNIVERSE++, a universal speech enhancement method based on score-based diffusion and adversarial training. Specifically, we improve the existing UNIVERSE model that decouples clean speech feature extraction and diffusion. Our contributions

Externí odkaz: http://arxiv.org/abs/2406.12194

Zobrazit plný text záznamu

Report

Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data

Autor: Shirahata, Yuma, Park, Byeongseon, Yamamoto, Ryuichi, Tachibana, Kentaro

This paper proposes an audio-conditioned phonemic and prosodic annotation model for building text-to-speech (TTS) datasets from unlabeled speech samples. For creating a TTS dataset that consists of label-speech paired data, the proposed annotation mo

Externí odkaz: http://arxiv.org/abs/2406.08111

Zobrazit plný text záznamu

Report

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

Autor: Kawamura, Masaya, Yamamoto, Ryuichi, Shirahata, Yuma, Hasumi, Takuya, Tachibana, Kentaro

We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations:

Externí odkaz: http://arxiv.org/abs/2406.07969

Zobrazit plný text záznamu

Report

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

Autor: Shimizu, Reo, Yamamoto, Ryuichi, Kawamura, Masaya, Shirahata, Yuma, Doi, Hironori, Komatsu, Tatsuya, Tachibana, Kentaro

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions. To control speaker identity within the prompt-based TTS framework, we introduce the concept of

Externí odkaz: http://arxiv.org/abs/2309.08140

Zobrazit plný text záznamu

Report

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Autor: Kawamura, Masaya, Shirahata, Yuma, Yamamoto, Ryuichi, Tachibana, Kentaro

We propose a lightweight end-to-end text-to-speech model using multi-band generation and inverse short-time Fourier transform. Our model is based on VITS, a high-quality end-to-end text-to-speech model, but adopts two changes for more efficient infer

Externí odkaz: http://arxiv.org/abs/2210.15975

Zobrazit plný text záznamu

Report

Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis

Autor: Shirahata, Yuma, Yamamoto, Ryuichi, Song, Eunwoo, Terashima, Ryo, Kim, Jae-Min, Tachibana, Kentaro

Several fully end-to-end text-to-speech (TTS) models have been proposed that have shown better performance compared to cascade models (i.e., training acoustic and vocoder models separately). However, they often generate unstable pitch contour with au

Externí odkaz: http://arxiv.org/abs/2210.15964

Zobrazit plný text záznamu

Report

Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation

Autor: Terashima, Ryo, Yamamoto, Ryuichi, Song, Eunwoo, Shirahata, Yuma, Yoon, Hyun-Wook, Kim, Jae-Min, Tachibana, Kentaro

Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is chal

Externí odkaz: http://arxiv.org/abs/2204.10020

Zobrazit plný text záznamu

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform

Autor: Kawamura, Masaya, Shirahata, Yuma, Yamamoto, Ryuichi, Tachibana, Kentaro

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6461707ec92ae811685267cbb6683cd0
https://doi.org/10.1109/icassp49357.2023.10095296

Zobrazit plný text záznamu

Period VITS: Variational Inference with Explicit Pitch Modeling for End-To-End Emotional Speech Synthesis

Autor: Shirahata, Yuma, Yamamoto, Ryuichi, Song, Eunwoo, Terashima, Ryo, Kim, Jae-Min, Tachibana, Kentaro

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::070bead108f917737704494717c53abe
https://doi.org/10.1109/icassp49357.2023.10096480

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání