Zobrazeno 1 - 10
of 178
pro vyhledávání: '"Tokuda, Keiichi"'
This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals. Recently, DDPM-based neural vocoders have gained prominence as non-autoregressiv
Externí odkaz:
http://arxiv.org/abs/2402.14692
This paper proposes singing voice synthesis (SVS) based on frame-level sequence-to-sequence models considering vocal timing deviation. In SVS, it is essential to synchronize the timing of singing with temporal structures represented by scores, taking
Externí odkaz:
http://arxiv.org/abs/2301.02262
This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS). A seq2seq modeling approach that can simultaneously perform acoustic and temporal modeling is a
Externí odkaz:
http://arxiv.org/abs/2212.13703
Autor:
Yoshimura, Takenori, Takaki, Shinji, Nakamura, Kazuhiro, Oura, Keiichiro, Hono, Yukiya, Hashimoto, Kei, Nankaku, Yoshihiko, Tokuda, Keiichi
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in t
Externí odkaz:
http://arxiv.org/abs/2211.11222
Autor:
Mitsui, Kentaro, Zhao, Tianyu, Sawada, Kei, Hono, Yukiya, Nankaku, Yoshihiko, Tokuda, Keiichi
The recent text-to-speech (TTS) has achieved quality comparable to that of humans; however, its application in spoken dialogue has not been widely studied. This study aims to realize a TTS that closely resembles human dialogue. First, we record and t
Externí odkaz:
http://arxiv.org/abs/2206.12040
Autor:
Nankaku, Yoshihiko, Sumiya, Kenta, Yoshimura, Takenori, Takaki, Shinji, Hashimoto, Kei, Oura, Keiichiro, Tokuda, Keiichi
This paper proposes a novel Sequence-to-Sequence (Seq2Seq) model integrating the structure of Hidden Semi-Markov Models (HSMMs) into its attention mechanism. In speech synthesis, it has been shown that methods based on Seq2Seq models using deep neura
Externí odkaz:
http://arxiv.org/abs/2108.13985
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2803-2815, 2021
This paper presents Sinsy, a deep neural network (DNN)-based singing voice synthesis (SVS) system. In recent years, DNNs have been utilized in statistical parametric SVS systems, and DNN-based SVS systems have demonstrated better performance than con
Externí odkaz:
http://arxiv.org/abs/2108.02776
Autor:
Hono, Yukiya, Takaki, Shinji, Hashimoto, Kei, Oura, Keiichiro, Nankaku, Yoshihiko, Tokuda, Keiichi
We propose PeriodNet, a non-autoregressive (non-AR) waveform generation model with a new model structure for modeling periodic and aperiodic components in speech waveforms. The non-AR waveform generation models can generate speech waveforms parallell
Externí odkaz:
http://arxiv.org/abs/2102.07786
Autor:
Hono, Yukiya, Tsuboi, Kazuna, Sawada, Kei, Hashimoto, Kei, Oura, Keiichiro, Nankaku, Yoshihiko, Tokuda, Keiichi
This paper proposes a hierarchical generative model with a multi-grained latent variable to synthesize expressive speech. In recent years, fine-grained latent variables are introduced into the text-to-speech synthesis that enable the fine control of
Externí odkaz:
http://arxiv.org/abs/2009.08474
Autor:
Nakamura, Kazuhiro, Takaki, Shinji, Hashimoto, Kei, Oura, Keiichiro, Nankaku, Yoshihiko, Tokuda, Keiichi
The present paper describes singing voice synthesis based on convolutional neural networks (CNNs). Singing voice synthesis systems based on deep neural networks (DNNs) are currently being proposed and are improving the naturalness of synthesized sing
Externí odkaz:
http://arxiv.org/abs/1910.11690