Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Shiyin Kang"'
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Publikováno v:
2022 IEEE Spoken Language Technology Workshop (SLT).
Publikováno v:
Interspeech 2022.
Autor:
Helen Meng, Hui Lu, Yuewen Cao, Shiyin Kang, Xixin Wu, Xunying Liu, Zhiyong Wu, Songxiang Liu
Publikováno v:
IEEE/ACM Transactions on Audio, Speech, and Language Processing. 29:874-886
Expressive text-to-speech (E-TTS) synthesis is important for enhancing user experience in communication with machines using the speech modality. However, one of the challenges in E-TTS is the lack of a precise description of emotions. Previous catego
Publikováno v:
Neural Networks. 125:121-130
Attention based end-to-end speech synthesis achieves better performance in both prosody and quality compared to the conventional "front-end"-"back-end" structure. But training such end-to-end framework is usually time-consuming because of the use of
Previous works on expressive speech synthesis focus on modelling the mono-scale style embedding from the current sentence or context, but the multi-scale nature of speaking style in human speech is neglected. In this paper, we propose a multi-scale s
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::74c5a80eb719e89e6b36e95c8bd1a110
Previous works on expressive speech synthesis mainly focus on current sentence. The context in adjacent sentences is neglected, resulting in inflexible speaking style for the same text, which lacks speech variations. In this paper, we propose a hiera
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dfd252d101fc31f2854460b48152f4dc
Publikováno v:
Interspeech 2021.
This paper describes a variational auto-encoder based non-autoregressive text-to-speech (VAENAR-TTS) model. The autoregressive TTS (AR-TTS) models based on the sequence-to-sequence architecture can generate high-quality speech, but their sequential d
Publikováno v:
ICASSP
Text-to-speech systems now can generate speech that is hard to distinguish from human speech. In this paper, we propose the Huya multi-speaker and multi-style speech synthesis system which is based on DurIAN and HiFi-GAN to generate high-fidelity spe
Autor:
Peng Liu, Shiyin Kang, Songxiang Liu, Na Hu, Xunying Liu, Yuewen Cao, Dan Su, Helen Meng, Dong Yu
Publikováno v:
ISCSLP
State-of-the-art singing voice synthesis (SVS) models can generate natural singing voice of a target speaker, given his/her speaking/singing data in the same language. However, there may be challenging conditions where only speech data in a non-targe