Zobrazeno 1 - 10
of 1 070
pro vyhledávání: '"Sişman AS"'
Recent advancements in Text-to-Speech (TTS) systems have enabled the generation of natural and expressive speech from textual input. Accented TTS aims to enhance user experience by making the synthesized speech more relatable to minority group listen
Externí odkaz:
http://arxiv.org/abs/2410.13342
Voice conversion (VC) aims to modify the speaker's identity while preserving the linguistic content. Commonly, VC methods use an encoder-decoder architecture, where disentangling the speaker's identity from linguistic information is crucial. However,
Externí odkaz:
http://arxiv.org/abs/2409.11560
Synthesizing the voices of unseen speakers is a persisting challenge in multi-speaker text-to-speech (TTS). Most multi-speaker TTS models rely on modeling speaker characteristics through speaker conditioning during training. Modeling unseen speaker a
Externí odkaz:
http://arxiv.org/abs/2408.17432
Recent advancements in flat-bottomed optical box traps have enabled the realization of homogeneous Bose gases, allowing for the exploration of Bose-Einstein condensation in more complex confinement geometries. Here we propose a shape-induced Bose-Ein
Externí odkaz:
http://arxiv.org/abs/2408.12698
Current strategies for achieving fine-grained prosody control in speech synthesis entail extracting additional style embeddings or adopting more complex architectures. To enable zero-shot application of pretrained text-to-speech (TTS) models, we pres
Externí odkaz:
http://arxiv.org/abs/2408.06827
Autor:
Chen, Changyou, Ding, Han, Sisman, Bunyamin, Xu, Yi, Xie, Ouye, Yao, Benjamin Z., Tran, Son Dinh, Zeng, Belinda
Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-mod
Externí odkaz:
http://arxiv.org/abs/2407.17571
In speech synthesis, modeling of rich emotions and prosodic variations present in human voice are crucial to synthesize natural speech. Although speaker embeddings have been widely used in personalized speech synthesis as conditioning inputs, they ar
Externí odkaz:
http://arxiv.org/abs/2407.04291
Autor:
Salman, Ali N., Du, Zongyang, Chandra, Shreeram Suresh, Ulgen, Ismail Rasim, Busso, Carlos, Sisman, Berrak
Voice conversion (VC) research traditionally depends on scripted or acted speech, which lacks the natural spontaneity of real-life conversations. While natural speech data is limited for VC, our study focuses on filling in this gap. We introduce a no
Externí odkaz:
http://arxiv.org/abs/2406.04494
Recent advances in style transfer text-to-speech (TTS) have improved the expressiveness of synthesized speech. However, encoding stylistic information (e.g., timbre, emotion, and prosody) from diverse and unseen reference speech remains a challenge.
Externí odkaz:
http://arxiv.org/abs/2406.03637
With rapid globalization, the need to build inclusive and representative speech technology cannot be overstated. Accent is an important aspect of speech that needs to be taken into consideration while building inclusive speech synthesizers. Inclusive
Externí odkaz:
http://arxiv.org/abs/2406.01018