Výsledky vyhledávání - "Takamichi, Shinnosuke"

Report

Who Finds This Voice Attractive? A Large-Scale Experiment Using In-the-Wild Data

Autor: Suda, Hitoshi, Watanabe, Aya, Takamichi, Shinnosuke

This paper introduces CocoNut-Humoresque, an open-source large-scale speech likability corpus that includes speech segments and their per-listener likability scores. Evaluating voice likability is essential to designing preferable voices for speech s

Externí odkaz: http://arxiv.org/abs/2407.04270

Zobrazit plný text záznamu

Report

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

Autor: Seki, Kentaro, Takamichi, Shinnosuke, Takamune, Norihiro, Saito, Yuki, Imamura, Kanami, Saruwatari, Hiroshi

This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the ste

Externí odkaz: http://arxiv.org/abs/2406.17722

Zobrazit plný text záznamu

Report

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment

Autor: Igarashi, Takuto, Saito, Yuki, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi

We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. Ho

Externí odkaz: http://arxiv.org/abs/2406.07280

Zobrazit plný text záznamu

Report

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark

Autor: Saito, Yuki, Igarashi, Takuto, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi

We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC wh

Externí odkaz: http://arxiv.org/abs/2406.07254

Zobrazit plný text záznamu

Report

YODAS: Youtube-Oriented Dataset for Audio and Speech

Autor: Li, Xinjian, Takamichi, Shinnosuke, Saeki, Takaaki, Chen, William, Shiota, Sayaka, Watanabe, Shinji

In this study, we introduce YODAS (YouTube-Oriented Dataset for Audio and Speech), a large-scale, multilingual dataset comprising currently over 500k hours of speech data in more than 100 languages, sourced from both labeled and unlabeled YouTube spe

Externí odkaz: http://arxiv.org/abs/2406.00899

Zobrazit plný text záznamu

Report

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Autor: Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as

Externí odkaz: http://arxiv.org/abs/2404.03204

Zobrazit plný text záznamu

Report

Building speech corpus with diverse voice characteristics for its prompt-based representation

Autor: Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi

In text-to-speech synthesis, the ability to control voice characteristics is vital for various applications. By leveraging thriving text prompt-based generation techniques, it should be possible to enhance the nuanced control of voice characteristics

Externí odkaz: http://arxiv.org/abs/2403.13353

Zobrazit plný text záznamu

Report

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Autor: Saeki, Takaaki, Maiti, Soumi, Takamichi, Shinnosuke, Watanabe, Shinji, Saruwatari, Hiroshi

While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes refe

Externí odkaz: http://arxiv.org/abs/2401.16812

Zobrazit plný text záznamu

Report

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Autor: Xin, Detai, Jiang, Junfeng, Takamichi, Shinnosuke, Saito, Yuki, Aizawa, Akiko, Saruwatari, Hiroshi

We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also non

Externí odkaz: http://arxiv.org/abs/2310.06072

Zobrazit plný text záznamu

Report

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Autor: Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi

In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive

Externí odkaz: http://arxiv.org/abs/2309.13509

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání