Zobrazeno 1 - 10
of 278
pro vyhledávání: '"Zhang, Guangyan"'
Autor:
Zhang, Guangyan, Merritt, Thomas, Ribeiro, Manuel Sam, Tura-Vecino, Biel, Yanagisawa, Kayoko, Pokora, Kamil, Ezzerg, Abdelhamid, Cygert, Sebastian, Abbas, Ammar, Bilinski, Piotr, Barra-Chicote, Roberto, Korzekwa, Daniel, Lorenzo-Trueba, Jaime
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong assumptions about the distributions of the target data space. Aiming to improve those assumptions, Normalizing Flows and Diffusion Probabilistic Models were recently
Externí odkaz:
http://arxiv.org/abs/2307.16679
Publikováno v:
INTERSPEECH 2023
This paper is about developing personalized speech synthesis systems with recordings of mildly impaired speech. In particular, we consider consonant and vowel alterations resulted from partial glossectomy, the surgical removal of part of the tongue.
Externí odkaz:
http://arxiv.org/abs/2305.17436
Autor:
Zhang, Guangyan, Qin, Ying, Zhang, Wenjie, Wu, Jialun, Li, Mei, Gai, Yutao, Jiang, Feijun, Lee, Tan
The capability of generating speech with specific type of emotion is desired for many applications of human-computer interaction. Cross-speaker emotion transfer is a common approach to generating emotional speech when speech with emotion labels from
Externí odkaz:
http://arxiv.org/abs/2206.14866
Autor:
Zhang, Guangyan, Song, Kaitao, Tan, Xu, Tan, Daxin, Yan, Yuzi, Liu, Yanqing, Wang, Gang, Zhou, Wei, Qin, Tao, Lee, Tan, Zhao, Sheng
Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent w
Externí odkaz:
http://arxiv.org/abs/2203.17190
This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in T
Externí odkaz:
http://arxiv.org/abs/2110.03887
Autor:
Zhang, Guangyan, Leng, Yichong, Tan, Daxin, Qin, Ying, Song, Kaitao, Tan, Xu, Zhao, Sheng, Lee, Tan
In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of m
Externí odkaz:
http://arxiv.org/abs/2110.03857
This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE
Externí odkaz:
http://arxiv.org/abs/2108.02821
Autor:
Yan, Yuzi, Tan, Xu, Li, Bohan, Zhang, Guangyan, Qin, Tao, Zhao, Sheng, Shen, Yuan, Zhang, Wei-Qiang, Liu, Tie-Yan
While recent text to speech (TTS) models perform very well in synthesizing reading-style (e.g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e.g., podcast or conversation), mainly because of two reasons: 1) the l
Externí odkaz:
http://arxiv.org/abs/2107.02530
Autor:
Li, Zhiyue, Zhang, Guangyan
Publikováno v:
In Fundamental Research May 2024 4(3):642-650
This paper presents the CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge. The challenge provides two Mandarin speech corpora: the AIShell-3 corpus of 218 speakers with noise and reverberation and the MST corpus including high-quality spee
Externí odkaz:
http://arxiv.org/abs/2103.04699