Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Zhang Zhengchen"'
Humans often speak in a continuous manner which leads to coherent and consistent prosody properties across neighboring utterances. However, most state-of-the-art speech synthesis systems only consider the information within each sentence and ignore t
Externí odkaz:
http://arxiv.org/abs/2211.06170
This paper proposes an expressive singing voice synthesis system by introducing explicit vibrato modeling and latent energy representation. Vibrato is essential to the naturalness of synthesized sound, due to the inherent characteristics of human sin
Externí odkaz:
http://arxiv.org/abs/2211.00996
Disentanglement of a speaker's timbre and style is very important for style transfer in multi-speaker multi-style text-to-speech (TTS) scenarios. With the disentanglement of timbres and styles, TTS systems could synthesize expressive speech for a giv
Externí odkaz:
http://arxiv.org/abs/2211.00967
Autor:
Zhang, Hong1 (AUTHOR) 2201210136@stu.pku.edu.cn, Zhang, Zhengchen1 (AUTHOR) zhangzhengchen@stu.pku.edu.cn, Wang, Zhenlin2 (AUTHOR) wzhenl@petrochina.com.cn, Wang, Yamin1 (AUTHOR) yamin.wang@pku.edu.cn, Yang, Rui1 (AUTHOR) 2301210152@stu.pku.edu.cn, Zhu, Tao2 (AUTHOR) zhu-tao@petrochina.com.cn, Luo, Feifei2 (AUTHOR) xjlff@petrochina.com.cn, Liu, Kouqi1 (AUTHOR) kouqi.liu@pku.edu.cn
Publikováno v:
Fractal & Fractional. Apr2024, Vol. 8 Issue 4, p242. 17p.
Autor:
Shen, Tong, Zuo, Jiawei, Shi, Fan, Zhang, Jin, Jiang, Liqin, Chen, Meng, Zhang, Zhengchen, Zhang, Wei, He, Xiaodong, Mei, Tao
We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. Compared to traditional text or voice-based system, ViDA-MAN offers human-like interactions (e.g, vi
Externí odkaz:
http://arxiv.org/abs/2110.13384
Autor:
Fu, Li, Li, Xiaoxiao, Wang, Runyu, Fan, Lu, Zhang, Zhengchen, Chen, Meng, Wu, Youzheng, He, Xiaodong
End-to-end Automatic Speech Recognition (ASR) models are usually trained to optimize the loss of the whole token sequence, while neglecting explicit phonemic-granularity supervision. This could result in recognition errors due to similar-phoneme conf
Externí odkaz:
http://arxiv.org/abs/2110.04187
Autor:
Liu, Kouqi, Zhang, Zhengchen, Safaei-Farouji, Majid, Fattahi, Elham, Zhang, Hong, Liu, Bo, Ostadhassan, Mehdi
Publikováno v:
In Geoenergy Science and Engineering August 2024 239
Autor:
Wang, Yamin, Wang, Zhenlin, Zhang, Zhengchen, Yao, Shanshan, Zhang, Hong, Zheng, Guoqing, Luo, Feifei, Feng, Lele, Liu, Kouqi, Jiang, Liangliang
Publikováno v:
In Energy Reviews June 2024 3(2)
Despite prosody is related to the linguistic information up to the discourse structure, most text-to-speech (TTS) systems only take into account that within each sentence, which makes it challenging when converting a paragraph of texts into natural a
Externí odkaz:
http://arxiv.org/abs/2011.05161
In this paper, we propose an incremental learning method for end-to-end Automatic Speech Recognition (ASR) which enables an ASR system to perform well on new tasks while maintaining the performance on its originally learned ones. To mitigate catastro
Externí odkaz:
http://arxiv.org/abs/2005.04288