Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Li, Canrun"'
Autor:
Li, Jiaqi, Wang, Dongmei, Wang, Xiaofei, Qian, Yao, Zhou, Long, Liu, Shujie, Yousefi, Midia, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Liu, Yanqing, Chen, Junkun, Zhao, Sheng, Li, Jinyu, Wu, Zhizheng, Zeng, Michael
Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In
Externí odkaz:
http://arxiv.org/abs/2409.04016
Autor:
Wu, Haibin, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Tompkins, Daniel, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Zhao, Sheng, Li, Jinyu, Kanda, Naoyuki
People change their tones of voice, often accompanied by nonverbal vocalizations (NVs) such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) systems lack the capability to generate speech with rich emotions, includin
Externí odkaz:
http://arxiv.org/abs/2407.12229
Autor:
Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Tan, Xu, Liu, Yanqing, Zhao, Sheng, Kanda, Naoyuki
This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, th
Externí odkaz:
http://arxiv.org/abs/2406.18009
Autor:
Eskimez, Sefik Emre, Wang, Xiaofei, Thakker, Manthan, Tsai, Chung-Hsien, Li, Canrun, Xiao, Zhen, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Jinyu, Zhao, Sheng, Kanda, Naoyuki
Accurate control of the total duration of generated speech by adjusting the speech rate is crucial for various text-to-speech (TTS) applications. However, the impact of adjusting the speech rate on speech quality, such as intelligibility and speaker
Externí odkaz:
http://arxiv.org/abs/2406.04281
Autor:
Kanda, Naoyuki, Wang, Xiaofei, Eskimez, Sefik Emre, Thakker, Manthan, Yang, Hemin, Zhu, Zirun, Tang, Min, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Xia, Yufei, Li, Jinzhu, Liu, Yanqing, Zhao, Sheng, Zeng, Michael
Laughter is one of the most expressive and natural aspects of human speech, conveying emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the ability to produce realistic and appropriate laughter sounds, limiting their a
Externí odkaz:
http://arxiv.org/abs/2402.07383