Zobrazeno 1 - 10
of 21
pro vyhledávání: '"Lee, Joun Yeop"'
We present SegINR, a novel approach to neural Text-to-Speech (TTS) that addresses sequence alignment without relying on an auxiliary duration predictor and complex autoregressive (AR) or non-autoregressive (NAR) frame-level sequence modeling. SegINR
Externí odkaz:
http://arxiv.org/abs/2410.04690
We propose a novel two-stage text-to-speech (TTS) framework with two types of discrete tokens, i.e., semantic and acoustic tokens, for high-fidelity speech synthesis. It features two core components: the Interpreting module, which processes text and
Externí odkaz:
http://arxiv.org/abs/2406.17310
We propose a novel text-to-speech (TTS) framework centered around a neural transducer. Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence (seq2seq) modeling and fine-grained acoustic modeling stages, utilizing discre
Externí odkaz:
http://arxiv.org/abs/2401.01498
We present a fast and high-quality codec language model for parallel audio generation. While SoundStorm, a state-of-the-art parallel audio generation model, accelerates inference speed compared to autoregressive models, it still suffers from slow inf
Externí odkaz:
http://arxiv.org/abs/2401.01099
Autor:
Bae, Jae-Sung, Lee, Joun Yeop, Lee, Ji-Hyun, Mun, Seongkyu, Kang, Taehwa, Cho, Hoon-Young, Kim, Chanwoo
Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data. However, the use of low-quality data has led to a decline in the overa
Externí odkaz:
http://arxiv.org/abs/2310.03538
Zero-shot multi-speaker text-to-speech (ZSM-TTS) models aim to generate a speech sample with the voice characteristic of an unseen speaker. The main challenge of ZSM-TTS is to increase the overall speaker similarity for unseen speakers. One of the mo
Externí odkaz:
http://arxiv.org/abs/2211.16866
Autor:
Lee, Jihwan, Bae, Jae-Sung, Mun, Seongkyu, Choi, Heejin, Lee, Joun Yeop, Cho, Hoon-Young, Kim, Chanwoo
With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise. Moreover, running a subjective evaluation for such cross-lingual TTS systems is troublesome. The vowel space analysis,
Externí odkaz:
http://arxiv.org/abs/2211.03078
Autor:
Lee, Jihwan, Lee, Joun Yeop, Choi, Heejin, Mun, Seongkyu, Park, Sangjun, Bae, Jae-Sung, Kim, Chanwoo
Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in diffe
Externí odkaz:
http://arxiv.org/abs/2204.01271
Autor:
Kim, Minchan, Jeong, Myeonghun, Choi, Byoung Jin, Ahn, Sunghwan, Lee, Joun Yeop, Kim, Nam Soo
Training a text-to-speech (TTS) model requires a large scale text labeled speech corpus, which is troublesome to collect. In this paper, we propose a transfer learning framework for TTS that utilizes a large amount of unlabeled speech dataset for pre
Externí odkaz:
http://arxiv.org/abs/2203.15447
Flow-based generative models are composed of invertible transformations between two random variables of the same dimension. Therefore, flow-based models cannot be adequately trained if the dimension of the data distribution does not match that of the
Externí odkaz:
http://arxiv.org/abs/2006.04604