Zobrazeno 1 - 10
of 335
pro vyhledávání: '"Wu Zhizheng"'
Autor:
He, Haorui, Song, Yuchen, Wang, Yuancheng, Li, Haoyang, Zhang, Xueyao, Wang, Li, Huang, Gongping, Chng, Eng Siong, Wu, Zhizheng
One-shot voice conversion (VC) aims to alter the timbre of speech from a source speaker to match that of a target speaker using just a single reference speech from the target, while preserving the semantic content of the original source speech. Despi
Externí odkaz:
http://arxiv.org/abs/2411.19770
Autor:
Huang, Yiqiao, Wang, Yuancheng, Li, Jiaqi, Guo, Haotian, He, Haorui, Zhang, Shunsi, Wu, Zhizheng
In debating, rebuttal is one of the most critical stages, where a speaker addresses the arguments presented by the opposing side. During this process, the speaker synthesizes their own persuasive articulation given the context from the opposing side.
Externí odkaz:
http://arxiv.org/abs/2411.06540
Autor:
Ye, Junyan, Zhou, Baichuan, Huang, Zilong, Zhang, Junan, Bai, Tianyi, Kang, Hengrui, He, Jun, Lin, Honglin, Wang, Zihao, Wu, Tong, Wu, Zhizheng, Chen, Yiping, Lin, Dahua, He, Conghui, Li, Weijia
With the rapid development of AI-generated content, the future internet may be inundated with synthetic data, making the discrimination of authentic and credible multimodal data increasingly challenging. Synthetic data detection has thus garnered wid
Externí odkaz:
http://arxiv.org/abs/2410.09732
Autor:
Liu, Peizhuo, Wang, Li, He, Renqiang, He, Haorui, Wang, Lei, Zheng, Huadi, Shi, Jie, Xiao, Tong, Wu, Zhizheng
In recent years, speech generation technology has advanced rapidly, fueled by generative models and large-scale training techniques. While these developments have enabled the production of high-quality synthetic speech, they have also raised concerns
Externí odkaz:
http://arxiv.org/abs/2409.11308
Autor:
Li, Jiaqi, Wang, Dongmei, Wang, Xiaofei, Qian, Yao, Zhou, Long, Liu, Shujie, Yousefi, Midia, Li, Canrun, Tsai, Chung-Hsien, Xiao, Zhen, Liu, Yanqing, Chen, Junkun, Zhao, Sheng, Li, Jinyu, Wu, Zhizheng, Zeng, Michael
Neural audio codec tokens serve as the fundamental building blocks for speech language model (SLM)-based speech generation. However, there is no systematic understanding on how the codec system affects the speech generation performance of the SLM. In
Externí odkaz:
http://arxiv.org/abs/2409.04016
Autor:
Wang, Yuancheng, Zhan, Haoyue, Liu, Liwei, Zeng, Ruihong, Guo, Haotian, Zheng, Jiachen, Zhang, Qiang, Zhang, Xueyao, Zhang, Shunsi, Wu, Zhizheng
The recent large-scale text-to-speech (TTS) systems are usually grouped as autoregressive and non-autoregressive systems. The autoregressive systems implicitly model duration but exhibit certain deficiencies in robustness and lack of duration control
Externí odkaz:
http://arxiv.org/abs/2409.00750
Autor:
Ma, Yinghao, Øland, Anders, Ragni, Anton, Del Sette, Bleiz MacSen, Saitis, Charalampos, Donahue, Chris, Lin, Chenghua, Plachouras, Christos, Benetos, Emmanouil, Shatri, Elona, Morreale, Fabio, Zhang, Ge, Fazekas, György, Xia, Gus, Zhang, Huan, Manco, Ilaria, Huang, Jiawen, Guinot, Julien, Lin, Liwei, Marinelli, Luca, Lam, Max W. Y., Sharma, Megha, Kong, Qiuqiang, Dannenberg, Roger B., Yuan, Ruibin, Wu, Shangda, Wu, Shih-Lun, Dai, Shuqi, Lei, Shun, Kang, Shiyin, Dixon, Simon, Chen, Wenhu, Huang, Wenhao, Du, Xingjian, Qu, Xingwei, Tan, Xu, Li, Yizhi, Tian, Zeyue, Wu, Zhiyong, Wu, Zhizheng, Ma, Ziyang, Wang, Ziyu
In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models
Externí odkaz:
http://arxiv.org/abs/2408.14340
Autor:
He, Haorui, Shang, Zengqiang, Wang, Chaoren, Li, Xuyuan, Gu, Yicheng, Hua, Hua, Liu, Liwei, Yang, Chen, Li, Jiaqi, Shi, Peiyang, Wang, Yuancheng, Chen, Kai, Zhang, Pengyuan, Wu, Zhizheng
Recent advancements in speech generation models have been significantly driven by the use of large-scale training data. However, producing highly spontaneous, human-like speech remains a challenge due to the scarcity of large, diverse, and spontaneou
Externí odkaz:
http://arxiv.org/abs/2407.05361
Recently, audio generation tasks have attracted considerable research interests. Precise temporal controllability is essential to integrate audio generation with real applications. In this work, we propose a temporal controlled audio generation frame
Externí odkaz:
http://arxiv.org/abs/2407.02869
Recent advancements in audio generation have enabled the creation of high-fidelity audio clips from free-form textual descriptions. However, temporal relationships, a critical feature for audio content, are currently underrepresented in mainstream mo
Externí odkaz:
http://arxiv.org/abs/2407.02857