Zobrazeno 1 - 10
of 462
pro vyhledávání: '"YAMAMOTO, Ryuichi"'
Neural vocoders often struggle with aliasing in latent feature spaces, caused by time-domain nonlinear operations and resampling layers. Aliasing folds high-frequency components into the low-frequency range, making aliased and original frequency comp
Externí odkaz:
http://arxiv.org/abs/2411.06807
We propose a novel description-based controllable text-to-speech (TTS) method with cross-lingual control capability. To address the lack of audio-description paired data in the target language, we combine a TTS model trained on the target language wi
Externí odkaz:
http://arxiv.org/abs/2409.17452
With the advancements in singing voice generation and the growing presence of AI singers on media platforms, the inaugural Singing Voice Deepfake Detection (SVDD) Challenge aims to advance research in identifying AI-generated singing voices from auth
Externí odkaz:
http://arxiv.org/abs/2408.16132
This paper proposes an audio-conditioned phonemic and prosodic annotation model for building text-to-speech (TTS) datasets from unlabeled speech samples. For creating a TTS dataset that consists of label-speech paired data, the proposed annotation mo
Externí odkaz:
http://arxiv.org/abs/2406.08111
We introduce LibriTTS-P, a new corpus based on LibriTTS-R that includes utterance-level descriptions (i.e., prompts) of speaking style and speaker-level prompts of speaker characteristics. We employ a hybrid approach to construct prompt annotations:
Externí odkaz:
http://arxiv.org/abs/2406.07969
Autor:
Igarashi, Takuto, Saito, Yuki, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. Ho
Externí odkaz:
http://arxiv.org/abs/2406.07280
Autor:
Saito, Yuki, Igarashi, Takuto, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi
We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC wh
Externí odkaz:
http://arxiv.org/abs/2406.07254
Autor:
Zang, Yongyi, Shi, Jiatong, Zhang, You, Yamamoto, Ryuichi, Han, Jionghao, Tang, Yuxun, Xu, Shengyuan, Zhao, Wenxiao, Guo, Jing, Toda, Tomoki, Duan, Zhiyao
Publikováno v:
Proceedings of Interspeech 2024
Recent singing voice synthesis and conversion advancements necessitate robust singing voice deepfake detection (SVDD) models. Current SVDD datasets face challenges due to limited controllability, diversity in deepfake methods, and licensing restricti
Externí odkaz:
http://arxiv.org/abs/2406.02438
Autor:
Zhang, You, Zang, Yongyi, Shi, Jiatong, Yamamoto, Ryuichi, Han, Jionghao, Tang, Yuxun, Toda, Tomoki, Duan, Zhiyao
The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presen
Externí odkaz:
http://arxiv.org/abs/2405.05244
This paper presents our systems (denoted as T13) for the singing voice conversion challenge (SVCC) 2023. For both in-domain and cross-domain English singing voice conversion (SVC) tasks (Task 1 and Task 2), we adopt a recognition-synthesis approach w
Externí odkaz:
http://arxiv.org/abs/2310.05203