Výsledky vyhledávání - "Saruwatari, Hiroshi"

Report

Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signals

Autor: Seki, Kentaro, Takamichi, Shinnosuke, Takamune, Norihiro, Saito, Yuki, Imamura, Kanami, Saruwatari, Hiroshi

This paper proposes a new task called spatial voice conversion, which aims to convert a target voice while preserving spatial information and non-target signals. Traditional voice conversion methods focus on single-channel waveforms, ignoring the ste

Externí odkaz: http://arxiv.org/abs/2406.17722

Zobrazit plný text záznamu

Report

Noise-Robust Voice Conversion by Conditional Denoising Training Using Latent Variables of Recording Quality and Environment

Autor: Igarashi, Takuto, Saito, Yuki, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi

We propose noise-robust voice conversion (VC) which takes into account the recording quality and environment of noisy source speech. Conventional denoising training improves the noise robustness of a VC model by learning noisy-to-clean VC process. Ho

Externí odkaz: http://arxiv.org/abs/2406.07280

Zobrazit plný text záznamu

Report

SRC4VC: Smartphone-Recorded Corpus for Voice Conversion Benchmark

Autor: Saito, Yuki, Igarashi, Takuto, Seki, Kentaro, Takamichi, Shinnosuke, Yamamoto, Ryuichi, Tachibana, Kentaro, Saruwatari, Hiroshi

We present SRC4VC, a new corpus containing 11 hours of speech recorded on smartphones by 100 Japanese speakers. Although high-quality multi-speaker corpora can advance voice conversion (VC) technologies, they are not always suitable for testing VC wh

Externí odkaz: http://arxiv.org/abs/2406.07254

Zobrazit plný text záznamu

Report

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Autor: Xin, Detai, Tan, Xu, Shen, Kai, Ju, Zeqian, Yang, Dongchao, Wang, Yuancheng, Takamichi, Shinnosuke, Saruwatari, Hiroshi, Liu, Shujie, Li, Jinyu, Zhao, Sheng

We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as

Externí odkaz: http://arxiv.org/abs/2404.03204

Zobrazit plný text záznamu

Report

Building speech corpus with diverse voice characteristics for its prompt-based representation

Autor: Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi

In text-to-speech synthesis, the ability to control voice characteristics is vital for various applications. By leveraging thriving text prompt-based generation techniques, it should be possible to enhance the nuanced control of voice characteristics

Externí odkaz: http://arxiv.org/abs/2403.13353

Zobrazit plný text záznamu

Report

Real-time Speech Extraction Using Spatially Regularized Independent Low-rank Matrix Analysis and Rank-constrained Spatial Covariance Matrix Estimation

Autor: Ishikawa, Yuto, Konaka, Kohei, Nakamura, Tomohiko, Takamune, Norihiro, Saruwatari, Hiroshi

Real-time speech extraction is an important challenge with various applications such as speech recognition in a human-like avatar/robot. In this paper, we propose the real-time extension of a speech extraction method based on independent low-rank mat

Externí odkaz: http://arxiv.org/abs/2403.12477

Zobrazit plný text záznamu

Report

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics

Autor: Saeki, Takaaki, Maiti, Soumi, Takamichi, Shinnosuke, Watanabe, Shinji, Saruwatari, Hiroshi

While subjective assessments have been the gold standard for evaluating speech generation, there is a growing need for objective metrics that are highly correlated with human subjective judgments due to their cost efficiency. This paper proposes refe

Externí odkaz: http://arxiv.org/abs/2401.16812

Zobrazit plný text záznamu

Report

Localizing Acoustic Energy in Sound Field Synthesis by Directionally Weighted Exterior Radiation Suppression

Autor: Tomita, Yoshihide, Koyama, Shoichi, Saruwatari, Hiroshi

A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical sit

Externí odkaz: http://arxiv.org/abs/2401.05809

Zobrazit plný text záznamu

Report

JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions

Autor: Xin, Detai, Jiang, Junfeng, Takamichi, Shinnosuke, Saito, Yuki, Aizawa, Akiko, Saruwatari, Hiroshi

We present the JVNV, a Japanese emotional speech corpus with verbal content and nonverbal vocalizations whose scripts are generated by a large-scale language model. Existing emotional speech corpora lack not only proper emotional scripts but also non

Externí odkaz: http://arxiv.org/abs/2310.06072

Zobrazit plný text záznamu

Report

Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control

Autor: Watanabe, Aya, Takamichi, Shinnosuke, Saito, Yuki, Nakata, Wataru, Xin, Detai, Saruwatari, Hiroshi

In text-to-speech, controlling voice characteristics is important in achieving various-purpose speech synthesis. Considering the success of text-conditioned generation, such as text-to-image, free-form text instruction should be useful for intuitive

Externí odkaz: http://arxiv.org/abs/2309.13509

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání