Zobrazeno 1 - 10
of 439
pro vyhledávání: '"Saruwatari, Hiroshi"'
Autor:
Seki, Kentaro, Takamichi, Shinnosuke, Takamune, Norihiro, Saito, Yuki, Imamura, Kanami, Saruwatari, Hiroshi
This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the ste
Externí odkaz:
http://arxiv.org/abs/2406.17722
Autor:
Igarashi, Takuto, Saito, Yuki, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. Ho
Externí odkaz:
http://arxiv.org/abs/2406.07280
Autor:
Saito, Yuki, Igarashi, Takuto, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC wh
Externí odkaz:
http://arxiv.org/abs/2406.07254
Autor:
Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as
Externí odkaz:
http://arxiv.org/abs/2404.03204
Autor:
Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi
In text-to-speech synthesis, the ability to control voice characteristics is vital for various applications. By leveraging thriving text prompt-based generation techniques, it should be possible to enhance the nuanced control of voice characteristics
Externí odkaz:
http://arxiv.org/abs/2403.13353
Real-time speech extraction is an important challenge with various applications such as speech recognition in a human-like avatar/robot. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank mat
Externí odkaz:
http://arxiv.org/abs/2403.12477
While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes refe
Externí odkaz:
http://arxiv.org/abs/2401.16812
A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical sit
Externí odkaz:
http://arxiv.org/abs/2401.05809
Autor:
Xin, Detai, Jiang, Junfeng, Takamichi, Shinnosuke, Saito, Yuki, Aizawa, Akiko, Saruwatari, Hiroshi
We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also non
Externí odkaz:
http://arxiv.org/abs/2310.06072
Autor:
Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi
In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive
Externí odkaz:
http://arxiv.org/abs/2309.13509