Zobrazeno 1 - 10
of 443
pro vyhledávání: '"SARUWATARI, Hiroshi"'
Autor:
Hyodo, Hiroaki, Takamichi, Shinnosuke, Nakamura, Tomohiko, Koguchi, Junya, Saruwatari, Hiroshi
We propose a singing voice synthesis (SVS) method for a more unified ensemble singing voice by modeling interactions between singers. Most existing SVS methods aim to synthesize a solo voice, and do not consider interactions between singers, i.e., ad
Externí odkaz:
http://arxiv.org/abs/2409.09988
We present our system (denoted as T05) for the VoiceMOS Challenge (VMC) 2024. Our system was designed for the VMC 2024 Track 1, which focused on the accurate prediction of naturalness mean opinion score (MOS) for high-quality synthetic speech. In add
Externí odkaz:
http://arxiv.org/abs/2409.09305
Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT
We explore cross-dialect text-to-speech (CD-TTS), a task to synthesize learned speakers' voices in non-native dialects, especially in pitch-accent languages. CD-TTS is important for developing voice agents that naturally communicate with people acros
Externí odkaz:
http://arxiv.org/abs/2409.07265
We present BigCodec, a low-bitrate neural speech codec. While recent neural speech codecs have shown impressive progress, their performance significantly deteriorates at low bitrates (around 1 kbps). Although a low bitrate inherently restricts perfor
Externí odkaz:
http://arxiv.org/abs/2409.05377
This paper presents SaSLaW, a spontaneous dialogue speech corpus containing synchronous recordings of what speakers speak, listen to, and watch. Humans consider the diverse environmental factors and then control the features of their utterances in fa
Externí odkaz:
http://arxiv.org/abs/2408.06858
Autor:
Nakata, Wataru, Seki, Kentaro, Yanaka, Hitomi, Saito, Yuki, Takamichi, Shinnosuke, Saruwatari, Hiroshi
Spoken dialogue plays a crucial role in human-AI interactions, necessitating dialogue-oriented spoken language models (SLMs). To develop versatile SLMs, large-scale and diverse speech datasets are essential. Additionally, to ensure hiqh-quality speec
Externí odkaz:
http://arxiv.org/abs/2407.15828
Autor:
Seki, Kentaro, Takamichi, Shinnosuke, Takamune, Norihiro, Saito, Yuki, Imamura, Kanami, Saruwatari, Hiroshi
This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the ste
Externí odkaz:
http://arxiv.org/abs/2406.17722
Autor:
Igarashi, Takuto, Saito, Yuki, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. Ho
Externí odkaz:
http://arxiv.org/abs/2406.07280
Autor:
Saito, Yuki, Igarashi, Takuto, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC wh
Externí odkaz:
http://arxiv.org/abs/2406.07254
Autor:
Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as
Externí odkaz:
http://arxiv.org/abs/2404.03204