Výsledky vyhledávání - "Zhang, Guangyan"

Report

Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models

Autor: Tian, Yusheng, Zhang, Guangyan, Lee, Tan

Publikováno v: INTERSPEECH 2023

This paper is about developing personalized speech synthesis systems with recordings of mildly impaired speech. In particular, we consider consonant and vowel alterations resulted from partial glossectomy, the surgical removal of part of the tongue.

Externí odkaz: http://arxiv.org/abs/2305.17436

Zobrazit plný text záznamu

Report

iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre

Autor: Zhang, Guangyan, Qin, Ying, Zhang, Wenjie, Wu, Jialun, Li, Mei, Gai, Yutao, Jiang, Feijun, Lee, Tan

The capability of generating speech with specific type of emotion is desired for many applications of human-computer interaction. Cross-speaker emotion transfer is a common approach to generating emotional speech when speech with emotion labels from

Externí odkaz: http://arxiv.org/abs/2206.14866

Zobrazit plný text záznamu

Report

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

Autor: Zhang, Guangyan, Song, Kaitao, Tan, Xu, Tan, Daxin, Yan, Yuzi, Liu, Yanqing, Wang, Gang, Zhou, Wei, Qin, Tao, Lee, Tan, Zhao, Sheng

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech (TTS) has drawn increasing attention. However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent w

Externí odkaz: http://arxiv.org/abs/2203.17190

Zobrazit plný text záznamu

Report

Environment Aware Text-to-Speech Synthesis

Autor: Tan, Daxin, Zhang, Guangyan, Lee, Tan

This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in T

Externí odkaz: http://arxiv.org/abs/2110.03887

Zobrazit plný text záznamu

Report

A study on the efficacy of model pre-training in developing neural text-to-speech system

Autor: Zhang, Guangyan, Leng, Yichong, Tan, Daxin, Qin, Ying, Song, Kaitao, Tan, Xu, Zhao, Sheng, Lee, Tan

In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of m

Externí odkaz: http://arxiv.org/abs/2110.03857

Zobrazit plný text záznamu

Report

Applying the Information Bottleneck Principle to Prosodic Representation Learning

Autor: Zhang, Guangyan, Qin, Ying, Tan, Daxin, Lee, Tan

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation.The problem of representation learning is formulated according to the information bottleneck (IB) principle. A modified VQ-VAE

Externí odkaz: http://arxiv.org/abs/2108.02821

Zobrazit plný text záznamu

Report

AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style

Autor: Yan, Yuzi, Tan, Xu, Li, Bohan, Zhang, Guangyan, Qin, Tao, Zhao, Sheng, Shen, Yuan, Zhang, Wei-Qiang, Liu, Tie-Yan

While recent text to speech (TTS) models perform very well in synthesizing reading-style (e.g., audiobook) speech, it is still challenging to synthesize spontaneous-style speech (e.g., podcast or conversation), mainly because of two reasons: 1) the l

Externí odkaz: http://arxiv.org/abs/2107.02530

Zobrazit plný text záznamu

Akademický článek

A globally shared resource paradigm for encoded storage systems in the public cloud

Autor: Li, Zhiyue, Zhang, Guangyan

Publikováno v: In Fundamental Research May 2024 4(3):642-650

Zobrazit plný text záznamu

Report

CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

Autor: Tan, Daxin, Huang, Hingpang, Zhang, Guangyan, Lee, Tan

This paper presents the CUHK-EE voice cloning system for ICASSP 2021 M2VoC challenge. The challenge provides two Mandarin speech corpora: the AIShell-3 corpus of 218 speakers with noise and reverberation and the MST corpus including high-quality spee

Externí odkaz: http://arxiv.org/abs/2103.04699

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání