Výsledky vyhledávání

Akademický článek

JVNV: A Corpus of Japanese Emotional Speech With Verbal Content and Nonverbal Expressions

Autor: Detai Xin, Junfeng Jiang, Shinnosuke Takamichi, Yuki Saito, Akiko Aizawa, Hiroshi Saruwatari

Publikováno v: IEEE Access, Vol 12, Pp 19752-19764 (2024)

We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also non

Externí odkaz: https://doaj.org/article/b121501764a048fc8ac3fdaf60d2cfd7

Zobrazit plný text záznamu

Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Autor: Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Pause insertion, also known as phrase break prediction and phrasing, is an essential part of TTS systems because proper pauses with natural duration significantly enhance the rhythm and intelligibility of synthetic speech. However, conventional phras

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2cfcd44e1a66851b87c8700e9d7c7213

Zobrazit plný text záznamu

Mid-attribute speaker generation using optimal-transport-based interpolation of Gaussian mixture models

Autor: Aya Watanabe, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Hiroshi Saruwatari

In this paper, we propose a method for intermediating multiple speakers' attributes and diversifying their voice characteristics in ``speaker generation,'' an emerging task that aims to synthesize a nonexistent speaker's naturally sounding voice. The

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::eec0972e1974421c176eed162e5fa2a6

Zobrazit plný text záznamu

Cross-Lingual Speaker Adaptation Using Domain Adaptation and Speaker Consistency Loss for Text-To-Speech Synthesis

Autor: Hiroshi Saruwatari, Shinnosuke Takamichi, Yuki Saito, Detai Xin, Tomoki Koriyama

Publikováno v: Interspeech 2021.

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::85a7512dfe673c2f314ea3e75e0182d7
https://doi.org/10.21437/interspeech.2021-897

Zobrazit plný text záznamu

Disentangled Speaker and Language Representations Using Mutual Information Minimization and Domain Adaptation for Cross-Lingual TTS

Autor: Hiroshi Saruwatari, Detai Xin, Shinnosuke Takamichi, Tatsuya Komatsu

Publikováno v: ICASSP

We propose a method for obtaining disentangled speaker and language representations via mutual information minimization and domain adaptation for cross-lingual text-to-speech (TTS) synthesis. The proposed method extracts speaker and language embeddin

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::c6b2dab54f235d2beaef34b73d6bfd20
https://doi.org/10.1109/icassp39728.2021.9414226

Zobrazit plný text záznamu

Cross-Lingual Text-To-Speech Synthesis via Domain Adaptation and Perceptual Similarity Regression in Speaker Space

Autor: Yuki Saito, Hiroshi Saruwatari, Shinnosuke Takamichi, Tomoki Koriyama, Detai Xin

Publikováno v: INTERSPEECH

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::dffb2e89270f47d8f8e4affce723fed7
https://doi.org/10.21437/interspeech.2020-2070

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání